Domain Thesaurus Construction from Wikipedia

WenKe Yin,Ming Zhu,TianHao Chen
DOI: https://doi.org/10.2991/iccnce.2013.22
2013-01-01
Abstract:The domain thesaurus plays an important role in information retrieval, natural language processing, question answering system etc. Due to the complexity of the natural language, the NLP based thesaurus constructing methods are difficult to achieve a desired result. In recent years, Wiki has been widely used as a knowledge base. Based on the characteristics anchor description and topic locality of hyperlinks, this paper proposes a hyperlink structure graph clustering based domain thesaurus construction method. The method first constructs a domain-specific hyperlink structure graph using Wiki, and then uses LSI algorithm to calculate the weight of each hyperlink. Then our method uses CPMw algorithm to cluster the weighted undirected hyperlink structure graph. After this step, domain thesaurus can be achieved. Experiments show that our method can get better results.
What problem does this paper attempt to address?