Automatic Navbox Generation by Interpretable Clustering over Linked Entities

Chenhao Xie,Lihan Chen,Jiaqing Liang,Kezun Zhang,Yanghua Xiao,Hanghang Tong,Haixun Wang,Wei Wang
DOI: https://doi.org/10.1145/3132847.3132899
2017-01-01
Abstract:Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.
What problem does this paper attempt to address?