Abstract:In today's data centers, memory-based key-value systems, such as Memcached and Redis, play an indispensable role in providing high-speed data services. The rapidly growing capacity and quickly falling price of DRAM memory in the past years have enabled us to create a large memory-based key-value store, which is able to serve hundreds of Gigabytes to even Terabytes of key-value data all in memory. Unfortunately, CPU cache in modern processors has not seen a similar growth in capacity, still remaining at the level of a few dozens of Megabytes. Such an extremely low cache-to-memory ratio (less than 0.1%) poses a significant new challenge---the limited CPU cache is becoming a severe performance bottleneck that hinders us from fully exploiting the great potential of high-speed memory-based key-value stores. To address this critical challenge, we propose a highly cache-efficient scheme, called Cavast , to optimize the cache utilization of large-capacity in-memory key-value stores. Our goal is to maximize cache efficiency and system performance without any hardware changes. We first present two light-weight, software-only mechanisms to enable user to indirectly control the cache content at application level. Then we propose a set of optimization policies to address several critical design issues that impair cache's efficacy in the current key-value store systems. By carefully reorganizing the data layout in memory, redesigning the hash indexing structure, and offloading garbage collection, we can effectively improve the utilization of the limited cache space. We have developed a module in Linux as a kernel-level support, and implemented two prototypes based on Memcached and Redis with the proposed Cavast scheme. Our experimental studies show promising results. On a 6-core Intel Xeon processor with only 15-MB cache, we can raise the cache hit ratio up to 82.7% with a very small cache-to-memory ratio (0.023%), and significantly increase the key-value system throughput by a factor of up to 4.2.

Earncache: Self-Adaptive Incremental Caching For Big Data Applications

JeCache: Just-Enough Data Caching with Just-in-Time Prefetching for Big Data Applications.

Efficient Cache Resource Aggregation Using Adaptive Multi-Level Exclusive Caching Policies

Put an Elephant into a Fridge

“Anti-Caching”-based elastic memory management for Big Data

Agile-Ant: Self-Managing Distributed Cache Management for Cost Optimization of Big Data Applications

SP-Cache: Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition

DartCache:a HashMap-Based Distributed Cache

Content Caching Clustering Based on Piecewise Interest Similarity

Adaptive Cache Policy Scheduling for Big Data Applications on Distributed Tiered Storage System.

Achieving Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition

A Spatial and Temporal Locality-Aware Adaptive Cache Design with Network Optimization for Tiled Many-Core Architectures.

Improving In-Memory File System Reading Performance by Fine-Grained User-Space Cache Mechanisms

Improving reading performance by file prefetching mechanism in distributed cache systems

Icache: an Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training.

Towards Intelligent Adaptive Edge Caching using Deep Reinforcement Learning

SAC: Dynamic Caching Upon Sketch for In-Memory Big Data Analytics

Adaptive Online Cache Capacity Optimization via Lightweight Working Set Size Estimation at Scale.

Joint Cache Size Scaling and Replacement Adaptation for Small Content Providers.

An Application-Oriented Cache Allocation and Prefetching Method for Long-Running Applications in Distributed Storage Systems

Caching For Non-Independent Content: Improving Information Gathering In Constrained Networks