Abstract:As the data volume continues to grow exponentially, there is an increasing demand for large storage system capacity. Data compression techniques effectively reduce the volume of written data, enhancing space efficiency. As a result, many modern SSDs have already incorporated data compression capabilities. However, data compression introduces additional processing overhead in critical I/O paths, potentially affecting system performance. Currently, most compression solutions in flash-based storage systems employ fixed compression algorithms for all incoming data without leveraging differences among various data access patterns. This leads to sub-optimal compression efficiency. This paper proposes a data-type-aware Flash Translation Layer (DAFTL) scheme to maximize space efficiency without compromising system performance. First, we propose an I/O behavior prediction method to forecast future access on specific data. Then, DAFTL matches data types with distinct I/O behaviors to compression algorithms of varying intensities, achieving an optimal balance between performance and space efficiency. Specifically, it employs higher-intensity compression algorithms for less frequently accessed data to maximize space efficiency. For frequently accessed data, it utilizes lower-intensity but faster compression algorithms to maintain system performance. Finally, an improved compact compression method is proposed to effectively eliminate page fragmentation and further enhance space efficiency. Extensive evaluations using a variety of real-world workloads, as well as the workloads with real data we collected on our platforms, demonstrate that DAFTL achieves more data reductions than other approaches. When compared to the state-of-the-art compression schemes, DAFTL reduces the total number of pages written to the SSD by an average of 8%, 21.3%, and 25.6% for data with high, medium, and low compressibility, respectively. In the case of workloads with real data, DAFTL achieves an average reduction of 10.4% in the total number of pages written to SSD. Furthermore, DAFTL exhibits comparable or even improved read and write performance compared to other solutions.

Virtual chunks: On supporting random accesses to scientific data in compressible storage systems

Dynamic Virtual Chunks: On Supporting Efficient Accesses to Compressed Scientific Data.

Improving the I/O Throughput for Data-Intensive Scientific Applications with Efficient Compression Mechanisms

Optimization of Cloud Computing Storage Based on Stackable File System

Enabling Random Access in Universal Compressors

Towards Higher Efficiency in a Distributed Memory Storage System Using Data Compression

Enabling Efficient Random Access to Hierarchically-Compressed Data

Improving I/O Throughput of Scientific Applications Using Transparent Parallel Compression

A Versatile Compression Method for Floating-Point Data Stream

Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications

Fast Compressed Segmentation Volumes for Scientific Visualization

Global Virtual Data Space for Unified Data Access Across Supercomputing Centers

QZFS: QAT Accelerated Compression in File System for Application Agnostic and Cost Efficient Data Storage.

Extremely-Compressed SSDs with I/O Behavior Prediction

UFCR: An Efficient I/O Method for Parallel File System

vPFS+: Managing I/O Performance for Diverse HPC Applications

Black-hole threshold solutions in stiff fluid collapse

ZFP: A compressed array representation for numerical computations

A self-aware data compression system on FPGA in Hadoop

VarFS: A Variable-sized Objects Based Distributed File System

HyperPart: A Hypergraph-based Abstraction for Deduplicated Storage Systems