Networked Systems Tech Talk series at Illinois Tech

This is a semi-regular, remote-friendly talk series that is scoped on data networks.

How to get notified about upcoming talks:

For enquiries related to this talk series contact Nik Sultana.


Time: 12:45-13:45 Central Time (Chicago/US)

Abstract: We introduce IONIA, a novel replication protocol tailored for modern SSD-based write-optimized key-value (WOKV) stores. Unlike existing replication approaches, IONIA carefully exploits the unique characteristics of SSD-based WO-KV stores. First, it exploits their interface characteristics to defer parallel execution to the background, enabling high-throughput yet one round trip (RTT) writes. IONIA also exploits SSD-based KV-stores’ performance characteristics to scalably read at any replica without enforcing writes to all replicas, thus providing scalability without compromising write availability; further, it does so while completing most reads in 1RTT. IONIA is the first protocol to achieve these properties, and it does so through its storage-aware design. We evaluate IONIA extensively to show that it achieves the above properties under a variety of workloads.

Bio: Henry Zhu is a 2nd year PhD student at UIUC under Ram Alagappan and Aishwarya Ganesan working on intersection between caching and storage systems.

Time: 11:00-12:00 Central Time (Chicago/US)

 Slides  Recording

Abstract: In this presentation, we’ll explore how to perform packet processing efficiently on off-the-shelf CPUs using the Vector Packet Processing (VPP) framework: https://s3-docs.fd.io/vpp/24.06/ We’ll find out about common software bottlenecks, and how to measure and optimize them adequately in the context of software data planes.

Bio: Guillaume Solignac is a software engineer at Cisco Meraki, where he works on the next generation of firmware for their security and SD-WAN appliance, with a strong focus on data plane performance. A big part of his job consists in leveraging VPP to forward packets as efficiently as possible on their platforms.

Time: 13:00-14:00 Central Time (Chicago/US)

 Slides  Recording

Abstract: Each year a dedicated group of over 200 volunteers gathers together to design, build, operate, and teardown SCinet: the networking infrastructure that supports the SC conference. SC draws nearly 14,000 attendees and is the premier conference for high performance computing, networking, and other technological advancements. This talk will discuss how SCinet takes on this herculean task yearly, and offer some previews of SC24 in Atlanta.

Bios: Angie Asmus is the Director of Network & Telecommunications at Colorado State University. She has 25 years of experience in IT across both private and public sectors. She holds a Bachelor of Science degree from Iowa State University and an MBA from Colorado State University. Angie is the SCinet chair, for SC24 in Atlanta GA.
Jason Zurawski is a Science Engagement Engineer with Lawrence Berkeley National Laboratory (LBNL), and the Energy Sciences Network (ESnet). Jason has over 20 years of experience in the R&E networking community, and graduated with a Bachelor of Science degree from the Pennsylvania State University, and an Master of Science degree from the University of Delaware. Jason was a founding member of the perfSONAR project, and the Science DMZ network architecture.

Time: 12:45-13:45 Central Time (Chicago/US)

Abstract: As the solitary inter-domain protocol, BGP plays an important role in today’s Internet. Its failures threaten network stability and will usually result in large-scale packet losses. Thus, the non-stop routing (NSR) capability that protects inter-domain connectivity from being disrupted by various failures, is critical to any Autonomous System (AS) operator. Replicating the BGP and underlying TCP connection status is key to realizing NSR. But existing NSR solutions, which heavily rely on OS kernel modifications, have become impractical due to providers’ adoption of virtualized network gateways for better scalability and manageability. We tackle this problem by proposing TENSOR, which incorporates a novel kernel-modification-free replication design and lightweight architecture. More concretely, the kernel-modification-free replication design mitigates the reliance on OS kernel modification and hence allows the virtualization of the network gateway. Meanwhile, lightweight virtualization provides strong performance guarantees and improves system reliability. Moreover, TENSOR provides a solution to the split-brain problem that affects NSR solutions. Through extensive experiments, we show that TENSOR realizes NSR while bearing little overhead compared to open-source BGP implementations. Further, our two-year operational experience on a fleet of 400 servers controlling over 31,000 BGP peering connections at Tencent demonstrates that TENSOR reduces the development, deployment, and maintenance costs significantly – at least by factors of 20, 5, and 10, respectively, while retaining the same SLA with the NSR-enabled routers.

Bio: Yunming Xiao is a final year PhD candidate at Northwestern University, advised by Prof. Aleksandar Kuzmanovic. He has interned at multiple companies such as HPE Labs, Google, Nokia Bell Labs, ByteDance, solving various research or software engineering problems. He also has a long-term collaboration with Tencent. Earlier, he obtained his B.Eng. from Beijing University of Posts and Telecommunications, China. He has published papers at networking venues such as ACM SIGCOMM, SIGMETRICS, CoNEXT, WWW, etc. His research interests lie in two primary directions: (i) enhancing the security and privacy measures of Internet services, and (ii) advancing the reliability of data center networks, aiming to minimize downtime.

Time: 12:45-13:45 Central Time (Chicago/US)

 Slides  Recording

Abstract: This presentation will describe the StarLight Software Defined Exchange (SDX) within the StarLight International/National Communications Exchange Facility, a nexus of over 120 networks extending to paths world-wide, including over 95*100 Gbps paths and 20*400 Gbps paths. The StarLight Facility is designed, implemented, and operated by a university and national research lab consortium specifically to support large scale global data intensive science. Current information technology is being transformed by innovations creating higher levels of abstraction for infrastructure, enabling unprecedented levels of virtualization and enabling the creation of new types of highly programmable, flexible, fluid, distributed environments. Some of these models have been expressed as Platform as a Service (PaaS), Infrastructure as a Service (IaaS), Software as a Service (SaaS), Anything as a Service (XaaS), and related models. These approaches are motivating a transition from dependencies on limited capability static infrastructure (foundation physical resources with rigid configurations and implementations) to capabilities that allow for large scale, highly distributed programmable environments, including those such as SDXs that adjust dynamically in real time to changes in workflow requirements, flows, and faults. Currently, this innovative architecture is being used by an international partnership to support the Global Research Platform (GRP), a highly distributed environment for data intensive science research. Science is global and open information sharing is a cornerstone of the science process, especially as science utilizes increasingly large volumes of data from around the world. The GRP is advancing scientific research across multiple domains with services for powerful distributed ecosystems, programmable cyberinfrastructure that integrates high-performance computing, storage, networks, data, analytics, and major instruments world-wide, based on global WANs, including 100G-400G-800G-Tbps-E2E paths.

Bio: Joe Mambretti is Director of the International Center for Advanced Internet Research at Northwestern University, which is developing digital communications for the 21st Century. The Center, which was created in partnership with a number of major high tech corporations, designs and implements large scale services and infrastructure for data intensive applications (metro, regional, national, and global). He is also Director of the Metropolitan Research and Education Network (MREN), an advanced high-performance network interlinking organization providing services in seven upper-Midwest states, which created the world's first GigaPoP, and the Director of the StarLight International/National Communications Exchange Facility in Chicago, a global exchange for advanced high performance networks, especially to support of global data intensive scientific research. He has published multiple articles in peer-reviewed scholarly journals as well as several books on advanced networking.

Time: 12:45-13:45 Central Time (Chicago/US)

Abstract: Engineered computer networks play a critical role in our society. These networks have evolved into complex systems with behaviours and characteristics that are beyond the characterizations and predictions possible using traditional approaches of modelling, analysis, and design. While experimentation can be used for many purposes, our interest is in screening experiments to identify the important parameters that significantly impact the measured responses such as throughput and delay. There are assumptions and limitations underlying many approaches to screening. One limitation of most experimental designs is that they do not identify unexpected behaviours resulting from cross-layer interactions. Similarly, in the analysis of the measurements, it is often assumed that the corresponding design is balanced.Overcoming assumptions and limitations based on experimental needs of engineered networks lead to a number of challenging and novel problems in combinatorics, which we discuss, along with results of a large scale screening experiment in w-iLab.t, a wireless network testbed. Joint work with Charles J. Colbourn, John Stufken, and Yasmeen Akhtar.

Bio: Violet Syrotiuk is an associate professor of computer science and engineering at Arizona State University. Her research interests include MAC protocols for wireless networks, dynamic adaptation of protocols to optimize performance for evolving network conditions, and programmable networks. Her work makes use of mid-scale experimental infrastructure such as the new FABRIC testbed. Dr. Syrotiuk serves on the editorial boards of the COMNET and COMCOM journals, is the Steering Committee Chair of the CNERT Workshop, and is a TPC Co-Chair of IFIP Networking 2023.

Time: 12:45-13:45 Central Time (Chicago/US)

Abstract: Experimental systems research is exciting, with multiple interesting directions to explore. First, one can focus on getting real-world data so that we can study how things work in the wild or even test out new ideas. Or, one can focus on building a networked system that is fundamentally different from the real world, catalyzing innovation. For me, network programmability played an essential role in both directions. In this talk, I will present two networked systems we have built. First, I will present P4Campus, a testbed that helps researchers study their own campus networks and also test new ideas against real production traffic. Next, I will present Pronto, an open-source end-to-end 5G network with deep programmability. I will share our experiences and lessons learned when building these networked systems. Finally, I will share some ideas for future work.

Bio: Hyojoon Kim is a Research Scholar in the Computer Science department at Princeton University, where he works and co-advises students with Jennifer Rexford. Hyojoon received his PhD and MS from Georgia Tech under the supervision of Nick Feamster and his BS from the University of Wisconsin - Madison. His research area includes network measurement, programmable networks, and distributed systems. He publishes in top conferences and journals, including ACM SIGCOMM, IMC, SIGMETRICS, CCR, SOSR, CHI, USENIX NSDI, and PoPETS. Previously, Hyojoon did several research internships at HP Labs in Palo Alto, USA.

Time: 12:45-13:45 Central Time (Chicago/US)

Abstract: Delivering a reliable network to customers is top business priority for Azure Networking. Network misconfigurations can be prevented from degrading reliability by verifying every network configuration change before its deployment. This is a hard problem to solve, given the scale and complexity of today's networks. We will discuss progress in network verification research in Azure Networking and its applications in production. Joint work with Nuno Lopes, Karthick Jayaraman, Nuno Afonso, Ryan Beckett, Dragos Dumitrescu, Jitu Padhye.

Bio: Andrey Rybalchenko likes solving cloud-scale problems with the help of logic, constraint solving, and automation. He's a senior principal researcher at Microsoft Research Cambridge, UK. Previously, Andrey was a professor of Computer Science at the Technical University of Munich, a researcher at the Max Planck Institute for Software Systems, a post-doc at EPFL, a master/PhD student at the University of Saarland. Andrey likes playing recreational ice hockey and hanging out with his family.

Time: 12:45-13:45 Central Time (Chicago/US)

Abstract: Disaggregation has recently emerged as a potent solution to tackle the challenge of growing resource consumption of the datacenter infrastructure. According to the disaggregated datacenter vision, compute units (CPUs, GPUs, accelerators) are decoupled from the memory hierarchy, with all components connected by the datacenter fabric. To stem the rise of the infrastructure costs, disaggregation emphasizes just-in-time resource allocation and re-purposing. When a workload demands resources, easily accessible, hot-swappable compute or memory components are removed where they are underutilized and are added where they are needed the most on-the-fly, without deploying additional servers and adding excessive capacity. This is unlike in traditional (i.e., aggregated) datacenters, where adding compute capacity requires the addition of entire new servers with expensive memory and network, which leads to underutilization of resources and increases costs for additional space, power, and management overhead. The goal of this talk is to draw attention to the emerging field of disaggregated systems, introduce the standardization efforts that industry is pursuing, and outline topics for future research in this key area.

Bio: Sergey Blagodurov is a Research Scientist at AMD since 2013. He works on redesigning datacenters for the Composable era. Prior to AMD, he has been a Research Associate with HP Labs for three years, where he studied and contributed to the design and operation of net-zero energy data centers. Sergey received a PhD degree in Computer Science from Simon Fraser University, Vancouver, Canada in 2013.

Abstract: To provide a learning and testing environment for cybersecurity and network education and research, we have developed an open-source Internet Emulator (called SEED Emulator), which allows us to create a miniature Internet that can run inside a single personal machine or on multiple cloud machines. Even though it is small, it has all the essential elements of the real Internet. Many interesting network technologies can be deployed on the emulator. We have used this emulator to create a DNS infrastructure, a Botnet, a Darknet, an Internet worm, and BGP prefix hijacking attacks. Many more are being developed. We have also deployed the Ethereum blockchain on the emulator, creating a Blockchain emulator with tens or even hundreds of nodes, all inside a single computer.

This emulator has been primarily used for education after it was released in August 2021, but recently several research groups have started to use it for their research. In this talk, I will present the design and features of the SEED emulator and its applications in both research and education. I will also demonstrate some of the interesting hands-on lab activities based on the emulator.

Bio: Dr. Wenliang (Kevin) Du, IEEE Fellow, is the Laura J. and L. Douglas Meredith Professor at Syracuse University. His current research interest focuses on Internet/blockchain emulation and cybersecurity education. He received his bachelor’s degree from the University of Science and Technology of China in 1993 and Ph.D. degree from Purdue University in 2001. He founded the SEED-Labs open-source project in 2002. The cybersecurity lab exercises developed from this project are now being used by over 1000 institutes worldwide. His self-published book, “Computer & Internet Security: A Hands-on Approach”, has been adopted as a textbook by 246 institutes. His online courses published on Udemy frequently won the “best seller” and “highest rated” recognition. He is the recipient of the 2017 Academic Leadership award from the 21st Colloquium for Information System Security Education. His research has been sponsored by multiple grants from the National Science Foundation and Google. He is a recipient of the 2021 ACSAC Test-of-Time Award and the 2013 ACM CCS Test-of-Time Award.

Abstract: Software-Defined Networks has held the industry's interest for the past few years. But why does the industry need SDN? What is the problem space addressed by SDN? What is the state of the union of SDN in the industry today: the HW/SW technologies delivering pieces of SDN, products shipping, and industry drivers. This talk will attempt address these questions from the ground up, to give the audience a sense of the maturity of the technologies and industry.

Bio: Ashok Sunder Rajan is a Principal Engineer at Intel in the Ethernet Networking Division. He is part of the Linux Open Source Software Development team, where he drives technologies for operationally scalable 5G infrastructure. During his tenure at Intel Labs, Ashok initiated research on mobile packet core infrastructure and delivered the Open Mobile Evolved Core (OMEC), a comprehensive set of telecom infrastructure stacks -- integrating SDN technologies based on Intel-based standard high-volume servers. OMEC, open sourced at Open Networking Foundation (Linux Foundation), pivots transformation of the $200Bn/Year Telecom Network to Cloud Infrastructure.

Abstract: We will discuss practical experience with SDN systems at Google. We examine a few systems (B4 and Traffic Engineering, Orion, Bandwidth Enforcer) through the lens of a production incident, focusing on how Google approaches building and operating reliable SDN systems and challenges that arise in production environments. Large-scale distributed systems need to support critical workloads with high reliability expectations. They are “systems of systems” composed of many moving parts, with large teams of people developing and managing them, and SDNs for hyperscale networks are no exception. Therefore, this talk will cover both systems design challenges and non-technical skills that are important to tackling problems that arise in industry.

Bio: Rich joined Google after completing his PhD at Yale University in Computer Science in 2010. He spent 8 years in Bandwidth SRE, working on productionizing and expanding our admission control and traffic engineering systems. Along the way, Rich helped form an SRE team to help grow Google’s software defined networking capabilities, then expanded into network reliability strategy. Rich is currently a Principal Engineer and an Uber Tech Lead within Core Networking SRE, the SRE organization responsible for Google’s production network.