SuperCDC: A Hybrid Design of High-Performance Content-Defined Chunking for Fast Deduplication

Binzhaoshuo Wan,Lifeng Pu,Xiangyu Zou,Shiyi Li,Peng Wang,Wen Xia
DOI: https://doi.org/10.1109/ICCD56317.2022.00034
2022-01-01
Abstract:Content-Defined Chunking (CDC) has been widely applied in data deduplication systems in the past since it can detect much more redundant data than Fixed-Size Chunking (FSC). CDC approach becomes faster and faster to match the improvement of high performance storage systems, and there are two main kinds of acceleration mechanisms: calculation-efficient acceleration and stream-informed acceleration. We observe the opportunity to combine the benefits of these two mechanisms to chase a faster speed, and the challenges of memory overhead and deduplication ratio loss, which are caused by stream histories and the configuration of minimum/maximum chunk size, respectively. Motivated by these observations, we proposed SuperCDC with several corresponding techniques, including hybridizing calculation-efficient processing with a stream-informed design, memory-efficient structure for tracking stream history, and Min-Max chunking for improving deduplication ratio. Evaluations suggest that SuperCDC achieves an up to 4.94× faster chunking speed and 6.23% higher deduplication ratio while saving 99.58% memory space on stream histories compared with the state-of-the-art Gear-based RapidCDC.
What problem does this paper attempt to address?