Abstract:In today's data centers, memory-based key-value systems, such as Memcached and Redis, play an indispensable role in providing high-speed data services. The rapidly growing capacity and quickly falling price of DRAM memory in the past years have enabled us to create a large memory-based key-value store, which is able to serve hundreds of Gigabytes to even Terabytes of key-value data all in memory. Unfortunately, CPU cache in modern processors has not seen a similar growth in capacity, still remaining at the level of a few dozens of Megabytes. Such an extremely low cache-to-memory ratio (less than 0.1%) poses a significant new challenge---the limited CPU cache is becoming a severe performance bottleneck that hinders us from fully exploiting the great potential of high-speed memory-based key-value stores. To address this critical challenge, we propose a highly cache-efficient scheme, called Cavast , to optimize the cache utilization of large-capacity in-memory key-value stores. Our goal is to maximize cache efficiency and system performance without any hardware changes. We first present two light-weight, software-only mechanisms to enable user to indirectly control the cache content at application level. Then we propose a set of optimization policies to address several critical design issues that impair cache's efficacy in the current key-value store systems. By carefully reorganizing the data layout in memory, redesigning the hash indexing structure, and offloading garbage collection, we can effectively improve the utilization of the limited cache space. We have developed a module in Linux as a kernel-level support, and implemented two prototypes based on Memcached and Redis with the proposed Cavast scheme. Our experimental studies show promising results. On a 6-core Intel Xeon processor with only 15-MB cache, we can raise the cache hit ratio up to 82.7% with a very small cache-to-memory ratio (0.023%), and significantly increase the key-value system throughput by a factor of up to 4.2.

GCaR: Garbage Collection Aware Cache Management with Improved Performance for Flash-based SSDs.

GC-ARM: Garbage Collection-Aware RAM Management for Flash Based Solid State Drives

Observation and Optimization on Garbage Collection of Flash Memories: The View in Performance Cliff

Put an Elephant into a Fridge

PLC-cache: Endurable SSD Cache for Deduplication-Based Primary Storage

Reo: Enhancing Reliability and Efficiency of Object-based Flash Caching

A Latency-Aware Garbage Collection Strategy

Lifespan-based garbage collection to improve SSD's reliability and performance

Fusion-Cache: A Refactored Content-Aware Host-Side SSD Cache.

A data affinity based garbage collector for multi-bank flash-memory storage system

GC-Steering: GC-Aware Request Steering and Parallel Reconstruction Optimizations for SSD-Based RAIDs

Acar: An Adaptive Cost Aware Cache Replacement Approach For Flash Memory

AMC: an Adaptive Multi‐level Cache Algorithm in Hybrid Storage Systems

DASH: A duplication-aware flash cache architecture in virtualization environment

A Duplication-Aware SSD-Based Cache Architecture for Primary Storage in Virtualization Environment

Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems

A Cost-aware Buffer Management Policy for Flash-based Storage Devices

MScache: A buffer management scheme based on page-level address mapping for NAND-flash SSD

Lazy-RTGC

COWCache: Effective Flash Caching for Copy-on-Write Virtual Disks

LOCA: a Low-Overhead Caching Algorithm for Flash-Based SSDs