Abstract:ABSTRACTWe say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is ϵ-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.

Scalable Representative Instance Selection and Ranking

A Dataset Representativeness Metric and A Slicing Sampling Strategy for the Kennard-Stone Algorithm

Identifying the Academic Rising Stars

Identifying The Academic Rising Stars Via Pairwise Citation Increment Ranking

Representativeness-Based Instance Selection for Intrusion Detection

Discriminative Representative Selection Via Structure Sparsity

A two-step Recommendation Algorithm via Iterative Local Least Squares

Finding representative set from massive data

Representative Selection Based on Sparse Modeling.

InstanceSR: Efficient Reconstructing Small Object with Differential Instance-level Super-Resolution

Learning to Rank Collections.

Evidential instance selection for K-nearest neighbor classification of big data

Dataset Regeneration for Sequential Recommendation

Sample-Efficient Clustering and Conquer Procedures for Parallel Large-Scale Ranking and Selection

A Data-Driven Approach for Extracting Representative Information from Large Datasets with Mixed Attributes

Selecting Representative Objects Considering Coverage and Diversity.

Continuously Extracting High-Quality Representative Set from Massive Data Streams.

Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank

Continuously identifying representatives out of massive streams

Per-Instance Algorithm Selection for Recommender Systems via Instance Clustering

Sample-Optimal Large-Scale Optimal Subset Selection