Refining Hierarchies Of Public Knowledge Spheres By Mutual Awareness Of Keywords - Towards A More Versatile Wikipedia

Yang Yu,Zhangxi Lin,Qing Li,Guoping Xia
2008-01-01
Abstract:Wikipedia is the most. complete encyclopedia in the world in the history (http://en.wikipedia.org/wiki/) with 2,210,000 articles and more than 100 million links among them. Web mining of the spheres of knowledge in wikipedia will help us to discover the anatomy of the Public knowledge structure and understand how the knowledge is accumulated and refined with the well conceived incentive mechanism. Current wikipedia spheres form with regard to mutual referencing among keywords. The structure of the link hierarchies is not optimized because these hierarchies were manually built and conceived with simple keyword matching. This paper focuses on improving the efficiency of knowledge retrieval from the wikipedia spheres with more accurate links among them. It defines a new measure of distance in terms of mutual awareness of keywords, which is underpinned by computational models. In this way, we can use the measure of mutual awareness to rank and reorient the keywords to discover the closest keyword clusters. This can significantly increase the accuracy of knowledge retrieval in wikipedia. We have collected more than 99% of keywords and their links from wikipedia.org. The application of hierarchical clustering resulted in important findings
What problem does this paper attempt to address?