HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization.

Gong Cheng,Cheng Jin,Yuzhong Qu
2016-01-01
Abstract:The rapid growth of open data on the Web promotes the development of data portals that facilitate finding useful datasets. To help users quickly inspect a dataset found in a portal, we propose to summarize its contents and generate a hierarchical grouping of entities connected by relations. Our generic approach, called HIEDS, considers coverage of dataset, height of hierarchy, cohesion within groups, overlap between groups, and homogeneity of groups, and integrates these configurable factors into a combinatorial optimization problem to solve. We present an efficient solution, to serve users with dynamically configured summaries with acceptable latency. We systematically experiment with our approach on real-world RDF datasets.
What problem does this paper attempt to address?