Ultra-Low-Latency and Flexible In-memory Key-Value Store System Design on CPU-FPGA
Yunhui Qiu,Hankun Lv,Jinyu Xie,Wenbo Yin,Lingli Wang
DOI: https://doi.org/10.1109/fpt.2018.00030
2018-01-01
Abstract:In-memory key-value store (KVS) is critical infrastructure in data centers and is facing challenges in performance and power consumption with the development of the big data technology, which mainly results from the low efficiency of the multi-level memory hierarchy of the CPU-based system. Remote direct memory access (RDMA) technology partly alleviates the problems, but it is still not efficient for KVS, especially for the PUT operation. In this paper, we present an ultra-low-latency and flexible in-memory KVS system based on the CPU-FPGA heterogeneous architecture, which leverages FPGA to serve as a KVS accelerator. We design a highly parallel accelerator architecture with several novel techniques, including memory pre-allocation, fragmentation processing, and decoupling design, to achieve ultra-low latency, high flexibility, efficiency, and scalability. The system workload can scale up with the storage capacity due to the decoupling design which stores the hash table in onboard DRAM memory and values in the host memory. For each KVS operation, at most one PCIe DMA is needed, which achieves high efficiency. Compared with current hardware-based KVS systems, the proposed one is more flexible, where the supported value range is 4x wider (from 1 byte to 4M bytes). In 10Gbps Ethernet, the peak throughput of the system can reach 13.6 million key-value operations per second (Mops), achieving nearly full utilization of the Ethernet bandwidth. The system latency can achieve as low as 1.2us for the PUT operation and 1.7us for the GET operation, which is 3.8x and 2.0x faster respectively than current state-of-the-art KVS systems.