Overlapping Schema Summarization Based On Multi-Label Propagation

Man Yu,Chao Wang,Xiangrui Cai,Ying Zhang,Yanlong Wen,Xiaojie Yuan
DOI: https://doi.org/10.1007/978-3-319-25255-1_29
2015-01-01
Abstract:Modern databases are usually composed of hundreds of tables. Querying an unfamiliar database is a tall order for users before they truly understand its schema. A schema summary can help to provide a succinct overview of the schema and improve the usability of databases. Existing summarization methods only focus on each element in a database belongs to one topic, ignores the fact that some elements may belong to multiple topics. This paper come up with a new method of generating overlapping summaries. It is the very first work to address the task as far as we know. We formulate overlapping schema summarization first and then introduce multi-label propagation algorithm in community detection to achieve several groups. To refine the partition, we cluster the groups additionally using hierarchical clustering algorithm. Finally, we find representative tables in each cluster to annotate the schema summary. The extensive experiments on both benchmark database and real-world database show that our approach not only achieves higher accuracy but also generates more meaningful summary.
What problem does this paper attempt to address?