Abstract:Nowadays, many customers and enterprises backup their data to cloud storage that performs deduplication to save storage space and network bandwidth. Hence, how to perform secure deduplication becomes a critical challenge for cloud storage. According to our analysis, the state-of-the-art secure deduplication methods are not suitable for cross-user fine-grained data deduplication. They either suffer brute-force attacks that can recover files falling into a known set, or incur large computation (time) overheads. Moreover, existing approaches of convergent key management incur large space overheads because of the huge number of chunks shared among users.Our observation that cross-user redundant data are mainly from the duplicate files, motivates us to propose an efficient secure deduplication scheme SecDep. SecDep employs UserAware Convergent Encryption (UACE) and Multi-Level Key management (MLK) approaches. (1) UACE combines cross-user file-level and inside-user chunk-level deduplication, and exploits different secure policies among and inside users to minimize the computation overheads. Specifically, both of file-level and chunk-level deduplication use variants of Convergent Encryption (CE) to resist brute-force attacks. The major difference is that the file-level CE keys are generated by using a server-aided method to ensure security of cross-user deduplication, while the chunk-level keys are generated by using a user-aided method with lower computation overheads. (2) To reduce key space overheads, MLK uses file-level key to encrypt chunk-level keys so that the key space will not increase with the number of sharing users. Furthermore, MLK splits the file-level keys into share-level keys and distributes them to multiple key servers to ensure security and reliability of file-level keys.Our security analysis demonstrates that SecDep ensures data confidentiality and key security. Our experiment results based on several large real-world datasets show that SecDep is more time-efficient and key-space-efficient than the state-of-the-art secure deduplication approaches.

An Effective and Cost-Based Framework for a Qualitative Hybrid Data Deduplication

A Hybrid Data Deduplication Approach in Entity Resolution Using Chromatic Correlation Clustering.

Cost-Based and Effective Human-Machine Based Data Deduplication Model in Entity Reconciliation

Applying Cluster Refinement to Improve Crowd-Based Data Duplicate Detection Approach

Pushing Collaborative Data Deduplication to the Network Edge: an Optimization Framework and System Design

A Comprehensive Study of the Past, Present, and Future of Data Deduplication

DBSCAN based Automatic de-duplication for software quality inspection data

A Thorough Investigation of Content-Defined Chunking Algorithms for Data Deduplication

PeerDedupe: Insights into the Peer-Assisted Sampling Deduplication.

Decentralized and Privacy Sensitive Data De-Duplication Framework for Convenient Big Data Management in Cloud Backup Systems

Ef-Dedup: Enabling Collaborative Data Deduplication At The Network Edge

DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads

SecDep: A User-Aware Efficient Fine-Grained Secure Deduplication Scheme with Multi-Level Key Management.

A Similarity-Aware Encrypted Deduplication Scheme with Flexible Access Control in the Cloud

Accelerating Content-Defined-chunking Based Data Deduplication by Exploiting Parallelism.

Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio

Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach

A Partial-Order-based Framework for Cost-Effective Crowdsourced Entity Resolution

Ss-Dedup : A High Throughput Stateful Data Routing Algorithm For Cluster Deduplication System

A Secure and Efficient Data Deduplication Scheme with Dynamic Ownership Management in Cloud Computing

Data De-duplication on Similar File Detection