Abstract:Storage cache hierarchies include diverse topologies, assorted parameters and policies, and devices with varied performance characteristics. Simulation enables efficient exploration of their configuration space while avoiding expensive physical experiments. Miss Ratio Curves (MRCs) efficiently characterize the performance of a cache over a range of cache sizes, revealing "key points" for cache simulation, such as knees in the curve that immediately follow sharp cliffs. Unfortunately, there are no automated techniques for efficiently finding key points in MRCs, and the cross-application of existing knee-detection algorithms yields inaccurate results. We present a multi-stage framework that identifies key points in any MRC, for both stack-based ( e.g. , LRU) and more sophisticated eviction algorithms ( e.g. , ARC). Our approach quickly locates candidates using efficient hash-based sampling, curve simplification, knee detection, and novel post-processing filters. We introduce Z-Method , a new multi-knee detection algorithm that employs statistical outlier detection to choose promising points robustly and efficiently. We evaluated our framework against seven other knee-detection algorithms, identifying key points in multi-tier MRCs with both ARC and LRU policies for 106 diverse real-world workloads. Compared to naïve approaches, our framework reduced the total number of points needed to accurately identify the best two-tier cache hierarchies by an average factor of approximately 5.5× for ARC and 7.7× for LRU. We also show how our framework can be used to seed the initial population for evolutionary algorithms. We ran 32,616 experiments requiring over three million cache simulations, on 151 samples, from three datasets, using a diverse set of population initialization techniques, evolutionary algorithms, knee-detection algorithms, cache replacement algorithms, and stopping criteria. Our results showed an overall acceleration rate of 34% across all configurations.

Live Forensics for Distributed Storage Systems

Kakute: A Precise, Unified Information Flow Analysis System for Big-data Security.

This is Why We Can't Cache Nice Things: Lightning-Fast Threat Hunting using Suspicion-Based Hierarchical Storage

Towards Optimizing Storage Costs on the Cloud

R-Store: A scalable distributed system for supporting real-time analytics

24/7 Characterization of petascale I/O workloads

Performance and Fault Tolerance in the StoreTorrent Parallel Filesystem

Pangea: Monolithic Distributed Storage for Data Analytics

SEAL: Storage-efficient Causality Analysis on Enterprise Logs with Query-friendly Compression.

Accelerating Filesystem Checking and Repair with pFSCK

Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure

Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to Supercomputers

Solving Big Data Challenges for Enterprise Application Performance Management

Live Recovery of Bit Corruptions in Datacenter Storage Systems

Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories

I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis

Accelerating multi-tier storage cache simulations using knee detection

A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems

The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage

Exploring Scientific Application Performance Using Large Scale Object Storage

Scalable Persistent Memory File System with Kernel-Userspace Collaboration.