Abstract:Multi-level buffer cache hierarchies are now commonly seen in most client/server cluster configurations, especially in today's big data application deployment. However, multi-level caching policies deployed so far typically use independent cache replacement algorithms in each level, which has two major drawbacks: (1) File blocks may be redundantly cached on multiple levels, reducing the actual aggregate cache usable size; (2) Less accurate replacement decisions at lower level caches due to weakened locality. Inefficient cache resource usage may result in noticeable performance degradation for big data applications. To address these problems, we propose new adaptive multi-level exclusive caching policies that can dynamically adjust replacement and placement decisions in response to changing access patterns. (1) First, to capture locality information in multi-level cache hierarchies, we propose a Reuse Distance based Adaptive Replacement Caching (ReDARC) algorithm that adopts reuse distance as the means of locality measure and adaptively balances between the Small Reuse Distance (SRD) set and Large Reuse Distance (LRD) set. (2) Second, to achieve exclusive caching and make global caching decisions, we propose an Adaptive Level-Aware Caching Algorithm (ALACA) that works collaboratively with ReDARC. The ALACA algorithm uses an adaptive probabilistic PUSH technique that allows lower caches to push blocks to higher caches and appropriately decide blocks' caching locations with the ReDARC algorithm. In this way, we achieve multi-level exclusive caching with significant cache performance improvement. Our trace-driven simulation experiments show that the policies we proposed achieve a reduction of the client average response time of 8 percent to 56 percent over other multi-level cache schemes. (C) 2017 Elsevier B.V. All rights reserved.

SP-Cache: Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition

Achieving Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition

PLC-cache: Endurable SSD Cache for Deduplication-Based Primary Storage

Improving In-Memory File System Reading Performance by Fine-Grained User-Space Cache Mechanisms

Efficient Cache Resource Aggregation Using Adaptive Multi-Level Exclusive Caching Policies

DASH: A duplication-aware flash cache architecture in virtualization environment

Fusion-Cache: A Refactored Content-Aware Host-Side SSD Cache.

Optimal Cache Partition-Sharing.

Improving Performance of Parallel I/O Systems Through Selective and Layout-Aware SSD Cache

Small File Read Performance Optimization Based on Redis Cache in Distributed Storage System

Content Caching Clustering Based on Piecewise Interest Similarity

S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems

A Duplication-Aware SSD-Based Cache Architecture for Primary Storage in Virtualization Environment

Sampling-based Caching for Low Latency in Distributed Coded Storage Systems

JeCache: Just-Enough Data Caching with Just-in-Time Prefetching for Big Data Applications.

Adaptive Cache Policy Scheduling for Big Data Applications on Distributed Tiered Storage System.

IO Dependent SSD Cache Allocation for Elastic Hadoop Applications

Adaptive Online Cache Capacity Optimization via Lightweight Working Set Size Estimation at Scale.

DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching

Accelerating MapReduce on Commodity Clusters: an SSD-Empowered Approach

Effectiveness and predictability of in-network storage cache for scientific workflows