Abstract:The Internet of Things (IoT) generates substantial data through sensors for diverse applications, such as healthcare services. This article addresses the challenge of efficiently utilizing resources in resource-scarce IoT-enabled sensors to enhance data collection, transmission, and storage. Redundant data transmission from sensors covering overlapping areas incurs additional communication and storage costs. Existing schemes, namely Asymmetric Extremum (AE) and Rapid Asymmetric Maximum (RAM), employ fixed and variable-sized windows during chunking. However, these schemes face issues while selecting the index value to decide the variable window size, which may remain zero or very low, resulting in poor deduplication. This article resolves this issue in the proposed Controlled Cut-point Identification Algorithm (CCIA), designed to restrict the variable-sized window to a certain threshold. The index value for deciding the threshold will always be larger than the half size of the fixed window. It helps to find more duplicates, but the upper limit offset is also applied to avoid the unnecessarily large-sized window, which may cause extensive computation costs. The extensive simulations are performed by deploying Windows Communication Foundation services in the Azure cloud. The results demonstrate the superiority of CCIA in various metrics, including chunk number, average chunk size, minimum and maximum chunk number, variable chunking size, and probability of failure for cut point identification. In comparison to its competitors, RAM and AE, CCIA exhibits better performance across key parameters. Specifically, CCIA outperforms in total number of chunks (6.81%, 14.17%), average number of chunks (4.39%, 18.45%), and minimum chunk size (153%, 190%). These results highlight the effectiveness of CCIA in optimizing data transmission and storage within IoT systems, showcasing its potential for improved resource utilization and reduced operational costs.

Ef-Dedup: Enabling Collaborative Data Deduplication At The Network Edge

Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach

A Delayed Container Organization Approach to Improve Restore Speed for Deduplication Systems.

Edge-cloud Collaborative Learning with Federated and Centralized Features

Deduplicator: When Computation Reuse Meets Load Balancing at the Network Edge

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Fog-assisted de-duplicated data exchange in distributed edge computing networks

Boafft: Distributed Deduplication for Big Data Storage in the Cloud

Towards Cluster-wide Deduplication Based on Ceph

Blockchain-Enabled Efficient Dynamic Cross-Domain Deduplication in Edge Computing

A Density-Based Offloading Strategy for IoT Devices in Edge Computing Systems

SecDedoop: Secure Deduplication with Access Control of Big Data in the HDFS/Hadoop Environment

Ss-Dedup : A High Throughput Stateful Data Routing Algorithm For Cluster Deduplication System

ESDedup: An efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems

DEDUCT: A Secure Deduplication of Textual Data in Cloud Environments

Speed-Dedup: A New Deduplication Framework for Enhanced Performance and Reduced Overhead in Scale-Out Storage

FuzzyDedup: Secure Fuzzy Deduplication for Cloud Storage

Secure deduplication schemes for content delivery in mobile edge computing

Efficient Cross-User Deduplication of Encrypted Data Through Re-Encryption.

PeerDedupe: Insights into the Peer-Assisted Sampling Deduplication.

Droplet: A Distributed Solution of Data Deduplication