D3: An Adaptive Reconfigurable Datacenter Network

Johannes Zerwas,Chen Griner,Stefan Schmid,Chen Avin
2024-06-19
Abstract:The explosively growing communication traffic in datacenters imposes increasingly stringent performance requirements on the underlying networks. Over the last years, researchers have developed innovative optical switching technologies that enable reconfigurable datacenter networks (RCDNs) which support very fast topology reconfigurations. This paper presents D3, a novel and feasible RDCN architecture that improves throughput and flow completion time. D3 quickly and jointly adapts its links and packet scheduling toward the evolving demand, combining both demand-oblivious and demand-aware behaviors when needed. D3 relies on a decentralized network control plane supporting greedy, integrated-multihop, IP-based routing, allowing to react, quickly and locally, to topological changes without overheads. A rack-local synchronization and transport layer further support fast network adjustments. Moreover, we argue that D3 can be implemented using the recently proposed Sirius architecture (SIGCOMM 2020). We report on an extensive empirical evaluation using packet-level simulations. We find that D3 improves throughput by up to 15% and preserves competitive flow completion times compared to the state of the art. We further provide an analytical explanation of the superiority of D3, introducing an extension of the well-known Birkhoff-von Neumann decomposition, which may be of independent interest.
Networking and Internet Architecture
What problem does this paper attempt to address?
The paper attempts to address the increasingly stringent performance requirements brought about by the rapid growth of communication traffic in data center networks. Specifically, with the popularity of data-intensive cloud applications (such as batch processing and distributed machine learning) and the trend of disaggregating data center resources, data center networks (DCNs) face capacity limitations. To tackle this challenge, researchers have been striving to enhance the capacity of DCNs. This paper proposes D3, a novel and feasible reconfigurable data center network (RCDN) architecture, aimed at improving throughput and flow completion time through rapid adaptation of link and packet scheduling. D3 combines demand-agnostic and demand-aware behaviors, enabling it to dynamically adjust its topology under changing demands. The main contributions of the paper include: 1. **Proposing the D3 architecture**: D3 is based on the Sirius architecture but adds different sub-topology components (static, demand-agnostic, and demand-aware) that can quickly adapt to changing traffic. 2. **Efficient link and packet scheduling**: D3 relies on a decentralized network control plane, supporting greedy multi-hop IP routing, which can quickly and locally respond to topology changes without incurring additional overhead. 3. **Transport layer solutions**: D3 provides different transport protocols for different sub-topology components to optimize various types of traffic. 4. **Theoretical foundation**: D3 introduces an extended Birkhoff-von Neumann matrix decomposition method, supporting hybrid topology types and theoretically explaining the advantages of D3. 5. **Experimental evaluation**: Through extensive simulation evaluations, D3 improves throughput by up to 15% and maintains or even improves flow completion time, showing significant advantages over existing technologies. In summary, by designing and implementing D3, this paper aims to address the performance challenges brought about by the ever-changing traffic patterns in data center networks, providing a flexible and efficient solution.