The Design of a Lossless Deduplication Scheme to Eliminate Fine-grained Redundancy for JPEG Image Storage Systems
Cai Deng,Xiangyu Zou,Qi Chen,Bo Tang,Wen Xia
DOI: https://doi.org/10.1109/tc.2024.3363456
IF: 3.183
2024-01-01
IEEE Transactions on Computers
Abstract:Image data storage has grown explosively, so image deduplication is used to save storage by eliminating redundancy between different images. However, traditional image deduplication cannot eliminate fine-grained redundancy nor guarantee lossless results. In this work, we propose imDedup, a lossless and fine-grained deduplication scheme for JPEG image storage systems. Specifically, imDedup uses a novel sampling hash method, Feature Bitmap, to detect similar images in a fast way by utilizing the information distribution of JPEG data. Meanwhile, it uses Idelta, a novel delta encoder that incorporates image compression into deduplication, to guarantee the non-redundant data can be re-compressed via image encoding and thus improves the compression ratio. Besides, we propose the DCHash and Fixed-Point Matching (FPM) techniques to further speed up Idelta. We also propose imDedup-plus, which dynamically chooses the DCHash-based or FPM-based compressor to achieve higher throughputs without sacrificing the compression ratio. Experimental results demonstrate the superiority of the imDedup-based methods on five datasets. Compared with the state-of-the-art similarity detector and delta encoder, imDedup achieves 1.8–4.4$\boldsymbol{\times}$× higher throughputs and 1.3–1.7$\boldsymbol{\times}$× higher compression ratios, respectively. Besides, imDedup-plus can further achieve 1.3–2.9$\boldsymbol{\times}$× higher throughputs than imDedup without sacrificing the compression ratio.
engineering, electrical & electronic,computer science, hardware & architecture