The Communication Complexity of Distributed Epsilon-Approximations

Zengfeng Huang,Ke Yi
DOI: https://doi.org/10.1109/focs.2014.69
2017-01-01
SIAM Journal on Computing
Abstract:Data summarization is an effective approach to dealing with the "big data" problem. While data summarization problems traditionally have been studied is the streaming model, the focus is starting to shift to distributed models, as distributed/parallel computation seems to be the only viable way to handle today's massive data sets. In this paper, we study ε-approximations, a classical data summary that, intuitively speaking, preserves approximately the density of the underlying data set over a certain range space. We consider the problem of computing ε-approximations for a data set which is held jointly by k players, and give general communication upper and lower bounds that hold for any range space whose discrepancy is known.
What problem does this paper attempt to address?