Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint
Kai Han,Shuang Cui,Tianshuai Zhu,Enpei Zhang,Benwei Wu,Zhizhuo Yin,Tong Xu,Shaojie Tang,He Huang
DOI: https://doi.org/10.1145/3447383
2021-01-01
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Abstract:Data summarization, a fundamental methodology aimed at selecting a representative subset of data elements from a large pool of ground data, has found numerous applications in big data processing, such as social network analysis [5, 7], crowdsourcing [6], clustering [4], network design [13], and document/corpus summarization [14]. Moreover, it is well acknowledged that the "representativeness" of a dataset in data summarization applications can often be modeled by submodularity - a mathematical concept abstracting the "diminishing returns" property in the real world. Therefore, a lot of studies have cast data summarization as a submodular function maximization problem (e.g., [2]).