Finding an λ-representative subset from massive data

Jin Zhang,Qiang Wei,Guoqing Chen
DOI: https://doi.org/10.1109/IFSA-NAFIPS.2013.6608466
2013-01-01
Abstract:Retrieving representative information from large-scale data becomes an important research issue nowadays. This paper focuses on certain aspects of representativeness in database queries and web search, and proposes an approach to extracting a subset of results from original search results in light of high coverage and low redundancy. In the paper, the notion of λ-Represent is introduced based on similarities and related fuzzy operations, which enables us to describe the λ-Represent relationship between the sets of data objects. Then, the λ-Representative problem is formulated as an extension of the typical set covering problem, which leads to developing a heuristic algorithm (namely, LamRep) to cope with the problem effectively. In LamRep, a “vote” mechanism is proposed to overcome the limitation of the naive greedy algorithm. Data experiments on benchmark data show that LamRep outperforms the other approaches.
What problem does this paper attempt to address?