Ef-Dedup: Enabling Collaborative Data Deduplication At The Network Edge

Shijing Li,Tian Lan,Bharath Balasubramanian,Moo-Ryong Ra,Hee Won Lee,Rajesh K. Panta
DOI: https://doi.org/10.1109/ICDCS.2019.00102
2019-01-01
Abstract:The advent of IoT and edge computing will lead to massive amounts of data that need to be collected and transmitted to online storage systems. To address this problem, we push data deduplication to the network edge. Specifically, we propose a new technique for collaborative edge-facilitated deduplication (EF-dedup), wherein we partition the resource-constrained edge nodes into disjoint clusters, maintain a deduplication index structure for each cluster using a distributed key-value store and perform decentralized deduplication within those clusters. This is a challenging partitioning problem that addresses a novel tradeoff: edge nodes with highly correlated data may not always be within the same edge cloud, with non-trivial network cost among them. We address this challenge by first formulating an optimization problem to partition the edge nodes, considering both the data similarities across the nodes and the inter-node network cost. We prove that the problem is NP-Hard, provide bounded heuristics to solve it and build a prototype EF-dedup system. Our experiments on EF-dedup, performed on edge nodes in AT&T research lab and a central cloud at AWS, demonstrate that EF-dedup achieves 38.3 similar to 118.5% better deduplication throughput than sole cloud-based techniques and achieves 43.4-60.2% lesser aggregate cost in terms of the network-storage tradeoff as compared to approaches that solely favor one over the other.
What problem does this paper attempt to address?