Abstract:Deduplication, which can save storage cost by enabling us to store only one copy of identical data, becomes unprecedentedly significant with the dramatic increase in data stored in the cloud. For the purpose of ensuring data confidentiality, they are usually encrypted before outsourced. Traditional encryption will inevitably result in multiple different ciphertexts produced from the same plaintext by different users’ secret keys, which hinders data deduplication. Convergent encryption makes deduplication possible since it naturally encrypts the same plaintexts into the same ciphertexts. One attendant problem is how to reliably and effectively manage a huge number of convergent keys. Several deduplication schemes have been proposed to deal with the convergent key management problem. However, they either need to introduce key management servers or require interaction between data owners. In this paper, we design a novel client-side deduplication protocol named KeyD without such an independent key management server by utilizing the identity-based broadcast encryption (IBBE) technique. Users only interact with the cloud service provider (CSP) during the process of data upload and download. Security analysis demonstrates that KeyD ensures data confidentiality and convergent key security, and well protects the ownership privacy simultaneously. A thorough and detailed performance comparison shows that our scheme makes a better tradeoff among the storage cost, communication and computation overhead.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve data deduplication in the cloud storage environment while ensuring data confidentiality and ownership privacy. Specifically, the paper focuses on the following points:
1. **The contradiction between data deduplication and encryption**: In order to protect the security of data outsourced to the cloud, encryption is usually carried out before uploading. However, traditional encryption methods will cause the same data to generate different ciphertexts under different users' keys, which hinders data deduplication. The paper proposes to use convergent encryption (CE), which can encrypt the same content into the same ciphertext, thus supporting data deduplication.
2. **The challenges of convergent key management**: Although convergent encryption can achieve data deduplication, as the amount of data increases, the number of convergent keys will also increase linearly, which brings a huge burden to key management. The paper proposes a new client - side deduplication protocol (KeyD), which uses identity - based broadcast encryption (IBBE) technology to manage convergent keys, avoiding the need to introduce an independent key management server or a trusted third party.
3. **Security and privacy protection**: The scheme designed in the paper not only ensures data confidentiality, but also protects the security of convergent keys and ownership privacy. Specifically, the scheme ensures:
- Data confidentiality: Users cannot obtain the ownership of data that does not belong to them from cloud service providers (CSPs) by running the proof of ownership (PoW) protocol, and users who have not proven ownership cannot decrypt the ciphertext stored in the cloud.
- Semantic security of convergent keys: Even if an attacker obtains some encrypted convergent keys, they cannot recover the convergent keys of files that do not belong to them, nor can they distinguish convergent keys according to the encrypted version.
- Ownership privacy: The ownership of data copies by users is private to malicious users and other users who share the same copy, that is, users who do not own the data cannot know who owns the data, and data owners cannot know who shares the data with them, even during the process of interacting with CSPs to update ownership.
In summary, this paper aims to design a secure and efficient client - side deduplication scheme by combining convergent encryption and identity - based broadcast encryption technologies to solve the challenges of data deduplication and key management in cloud storage.