Generating Illustrative Snippets For Open Data On The Web

Gong Cheng,Cheng Jin,Wentao Ding,Danyun Xu,Yuzhong Qu
DOI: https://doi.org/10.1145/3018661.3018670
2017-01-01
Abstract:To embrace the open data movement, increasingly many datasets have been published on the Web to be reused. Users, when assessing the usefulness of an unfamiliar dataset, need means to quickly inspect its contents. To satisfy the needs, we propose to automatically extract an optimal small portion from a dataset, called a snippet, to concisely illustrate the contents of the dataset. We consider the quality of a snippet from three aspects: coverage, familiarity, and cohesion, which are jointly formulated in a new combinatorial optimization problem called the maximum-weight-and-coverage connected graph problem (MwcCG). We give a constant-factor approximation algorithm for this NP-hard problem, and experiment with our solution on real-world datasets. Our quantitative analysis and user study show that our approach outperforms a baseline approach.
What problem does this paper attempt to address?