DedupeSwift: Object-Oriented Storage System Based on Data Deduplication

Jingwei Ma,Gang Wang,Xiaoguang Liu
DOI: https://doi.org/10.1109/trustcom.2016.0177
2016-01-01
Abstract:Recent years have witnessed the explosion of the data universe. Facing the rapid growth of the data size, cloud storage is proposed as an approach to provide cost-efficient and reliable data storage service. As data size grows, data centers providing cloud storage service need more storage resources to meet the ever-increasing requirements. Data deduplication is a technology aiming to remove redundant data blocks. It has been used to reduce the storage footprint of backup and archival systems. In this paper, we propose DedupeSwift, which is based on OpenStack Swift, an open-source object-oriented storage software widely used in public and private clouds. Data deduplication is introduced to reduce the storage overhead. To deal with the performance overhead brought by deduplication, a lazy method is introduced to reduce the disk I/O bottleneck. Compression and caching are also used in the system to improve the read performance. Experimental results show that our proposed DedupeSwift can reduce the storage overhead by 65.24% and 89.84% on the two data sets with favorable upload and download throughput.
What problem does this paper attempt to address?