DDS: DPU-optimized Disaggregated Storage [Extended Report]

Qizhen Zhang,Philip Bernstein,Badrish Chandramouli,Jiasheng Hu,Yiming Zheng
2024-08-28
Abstract:This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units). DPUs can optimize the latency and CPU consumption of disaggregated storage servers. However, utilizing DPUs for DBMSs requires careful design of the network and storage paths and the interface exposed to the DBMS. To fully benefit from DPUs, DDS heavily uses DMA, zero-copy, and userspace I/O to minimize overhead when improving throughput. It also introduces an offload engine that eliminates host CPUs by executing client requests directly on the DPU. Adopting DDS' API requires minimal DBMS modification. Our experimental study and production system integration show promising results -- DDS achieves higher disaggregated storage throughput with an order of magnitude lower latency, and saves up to tens of CPU cores per storage server.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper attempts to address the issues of high latency and high CPU consumption brought about by storage and compute separation (i.e., storage disaggregation) in cloud-native database systems (DBMSs). Specifically: 1. **High Latency Issue**: In the current storage disaggregation architecture, data requests need to be processed through multiple layers, including the network and file modules of the operating system kernel and the I/O stack within the database management system, leading to high request latency. 2. **High CPU Consumption Issue**: As storage bandwidth increases, the CPU resources required to handle these storage requests also increase, especially under workloads with frequent read operations, where CPU consumption is particularly significant. To address these issues, the paper proposes a new solution—DDS (DPU-optimized Disaggregated Storage), which leverages emerging network hardware—Data Processing Units (DPUs) to optimize the latency and CPU consumption of storage servers. DPUs can directly handle storage requests, reducing the burden on the host CPU, and minimize overhead through technologies such as DMA, zero-copy, and user-space I/O, thereby improving throughput. Additionally, DDS introduces an offload engine that can execute client requests directly on the DPU, eliminating the use of the host CPU. Experimental results show that DDS can achieve higher storage throughput at saturated I/O performance while reducing latency by an order of magnitude and saving dozens of CPU cores per storage server.