Abstract:Emerging cloud-native distributed databases rely on local NVMe SSDs to provide high-performance and highavailable data services to many cloud applications. However, the database clusters suffer from low utilization of local storage because of the imbalance between CPU and storage capacities within each node. For instance, the OceanBase distributed database cluster, with hundreds of PB local storage capacity, only utilizes around 40% of its local storage. Although disaggregated storage (EBS) can enhance storage utilization by provisioning the CPU and storage independently on demand, they suffer from performance bottlenecks and high costs. In this paper, we propose LightPool, a high-performance and lightweight storage pool architecture large-scale deployed in the OceanBase clusters, enhancing storage resource utilization. The key idea of LightPool is aggregating cluster storage into a storage pool and enabling unified management. In particular, LightPool adopts NVMe-oF to enable high-performance storage resource sharing among cluster nodes and integrate the storage pool with Kubernetes to achieve flexible management and allocation of storage resources. Furthermore, we design the hot-upgrade and hot-migration mechanisms to enhance the availability of LightPool. We have deployed LightPool on over 8500 nodes in production clusters. Statistics show that LightPool can improve storage resource utilization from about 40% to 65%. Experimental results show that the extra latency from LightPool is only about 2.1 mu s compared to local storage. Compared to OpenEBS, LightPool enhances bandwidth up to 190.9% in microbenchmarks and throughput up to 6.9% in real-world applications. LightPool is the best practice to deploy NVMe-oF (NVMe/TCP) in the production environment. We also discuss important lessons and experiences learned from the development of LightPool.

LTNoT: Realizing the Trade-Offs Between Latency and Throughput in NVMe over TCP.

A Transformable NVMeoF Queue Design for Better Differentiating Read and Write Request Processing

Load-aware Transmission Mechanism for NVMeoF Storage Networks

Performance Characterization of SmartNIC NVMe-over-Fabrics Target Offloading

Torp: Full-Coverage and Low-Overhead Profiling of Host-Side Latency

Toward Full-Coverage and Low-Overhead Profiling of Network-Stack Latency

Alleviating Performance Interference Through Intra-Queue I/O Isolation for NVMe-over-Fabrics.

An Ultra-Low Latency and Compatible PCIe Interconnect for Rack-Scale Communication.

HyQ: Hybrid I/O Queue Architecture for NVMe over Fabrics to Enable High- Performance Hardware Offloading

SCALABLE MULTI-SESSION TCP OFFLOAD ENGINE FOR LATENCY-SENSITIVE APPLICATIONS

Lttp: an Lt-Code Based Transport Protocol for Many-To-One Communication in Data Centers

A Throughput-Oriented NVMe Storage Virtualization with Workload-Aware Management

LightPool: A NVMe-oF-based High-performance and Lightweight Storage Pool Architecture for Cloud-Native Distributed Database

Low Latency TOE with Double-Queue Structure for 10Gbps Ethernet on FPGA

High-performance and Scalable Software-based NVMe Virtualization Mechanism with I/O Queues Passthrough

Latte: A Native Table Engine on Nvme Storage

Implementation of TCP Large Receive Offload on Multi-Core NPU Platform

Understanding Performance of I/O Intensive Containerized Applications for Nvme Ssds

Hardware TCP Offload Engine Based on 10-Gbps Ethernet for Low-Latency Network Communication.

Achieving High Throughput by Transparent Network Interface Virtualization on Multi-core Systems

TSoR: TCP Socket over RDMA Container Network for Cloud Native Computing