Abstract:As a core component in modern data centers, key-value cache provides high-throughput and low-latency services for high-speed data processing. The effectiveness of a key-value cache relies on its ability of accommodating the needed data. However, expanding the cache capacity is often more difficult than commonly expected because of many practical constraints, such as server costs, cooling issues, rack space, and even human resource expenses. A potential solution is compression, which virtually extends the cache capacity by condensing data in cache. In practice, this seemingly simple idea has not gained much traction in key-value cache system design, due to several critical issues: the compression-unfriendly index structure, severe read/write amplification, wasteful decompression operations, and heavy computing cost. This paper presents a hybrid DRAM-SSD cache design to realize a systematic integration of data compression in key-value cache. By treating compression as an essential component, we have redesigned the indexing structure, data management, and leveraged the emerging computational SSD hardware for collaborative optimizations. We have developed a prototype, called ZipCache. Our experimental results show that ZipCache can achieve up to 72.4% higher throughput and 42.4% lower latency, while reducing the write amplification by up to 26.2 times.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively expand the capacity of the key - value cache system in modern data centers without significantly increasing hardware costs in order to meet the growing data processing demands. Specifically, the paper explores methods of virtually expanding cache capacity through data compression techniques and proposes systematic solutions to several key problems faced by the adoption of compression techniques in existing key - value cache systems. These problems include:
1. **Compression - unfriendly index structure**: Most key - value cache systems use hash - based index structures, which lead to random data distribution and are not conducive to effective data compression.
2. **Severe read - write amplification problem**: Key - value workloads usually contain a large number of small - sized data items, and the effect of directly compressing each small - sized data item is limited. In order to obtain a reasonable compression ratio, multiple small - sized data items need to be packed and compressed, but this will lead to a significant increase in access operations, that is, read - write amplification.
3. **Compression and decompression as opposing processes**: Usually, data blocks are compressed and decompressed as a whole. As the compression granularity increases, the decompression efficiency decreases because more data needs to be decompressed to find the required key - value pairs.
4. **High computational cost brought by compression**: Data compression and decompression are computationally intensive, occupying precious CPU resources and affecting the performance of front - end services, especially in scenarios with strict requirements for cache latency.
To solve the above problems, the paper proposes a new hybrid DRAM/SSD cache design - ZipCache. By regarding data compression as a core component in cache system design, it redesigns the index structure, data management, and hardware co - optimization strategies. Specific measures include:
- **Abandoning the traditional hash index structure** and adopting the B + tree index structure to manage key - value items, maintaining content similarity and spatial locality.
- **Introducing super - leaf nodes** to store key - value data in virtualized SSD storage space, using commercial SSDs with built - in transparent compression functions to achieve low - cost indexing without wasting physical storage space.
- **Decoupling the data units of compression and decompression**, creating a special in - page structure to support on - demand decompression, reducing decompression time and read amplification.
- **Making full use of computational SSDs with built - in transparent compression functions**, offloading data compression operations from the CPU to storage devices, reducing the computational burden and eliminating potential interference.
Experimental results show that compared with the existing state - of - the - art solutions, ZipCache can achieve a throughput increase of up to 72.4% and a 90 - percentile read latency reduction of 42.4%, while reducing SSD write amplification to a maximum of 26.2 times.