Effective Chinese Organization Name Linking to a List-Like Knowledge Base.
Chengyuan Xue,Haofen Wang,Bo Jin,Mengjie Wang,Daqi Gao
DOI: https://doi.org/10.1007/978-3-662-45495-4_9
2014-01-01
Abstract:Entity Linking is widely used in entity retrieval and semantic search. It refers mentions in unstructured documents to their representations in a knowledge base (KB). The frequently used KB (e.g. Wikipedia) usually contains abundant information corresponding to each entity, such as properties, name variations and text descriptions, which can help to find candidates and disambiguate the links. In this paper, we link organization names in Chinese documents to a list-like KB. Compared to typical KBs, the records in our KB are simply Chinese organization full names. The massive variations, or abbreviations in the documents cannot be directly matched to any organization name in the KB and bring about ambiguities, thus make the linking task difficult. At first, we enrich the KB with the abbreviations. Making use of the information from Hudong Baike and other sources, we design a pattern based full name annotation method to help generate abbreviations for all the names in the KB. To resolve the ambiguity problem, we propose a two-stage linking generation approach utilizing the co-occurrence of abbreviations and full names in the same document or document cluster, where the linked full names in the first stage constraint the linking of abbreviations in the second stage. We apply our approach to police inquiry document corpus. The experiment results show the effectiveness of our approach and outperforms the one-stage approach significantly in terms of precision and recall.