LSA-Based Chinese-Slavic Mongolian NER Disambiguation.

Jiang Yupeng,Hou Hongxu,Yang Ping
DOI: https://doi.org/10.1109/cit/iucc/dasc/picom.2015.102
2015-01-01
Abstract:The ambiguity of named entity refers to one named entity with multiple entity concepts. We use the text contextual information and other external repository to cope with the ambiguity of named entity. Then we can make sure the truly allegations of a named entity. Our system can improve the performance of the online recommendation system, the ability to extract information and other practical applications of information retrieval. But the research that specifically for Mongolian named entity disambiguation is still in its infancy. Considering that the same words and entities often have different semantics themes in the Mongolian document (example, "Bartel" which can refer to the ordinary heroes, can also refer to a specific person), we should disambiguate named entity recognition error. There is relatively scarce knowledge in Mongolian. The external repository is relatively narrow for disambiguation field. In this paper, we use the LSA (Latent Semantic Analysis) to do latent semantic analysis on the Mongolian clusters. As for the data sparseness problem exists for the Mongolian derivational suffix and inflection suffix, we conduct some methods to stemming. We build the vector space that based on the Mongolian root. We use SVD (singular value decomposition) to reduce the dimension of vector space to identify potential semantic relationships between words. In this paper, we propose a generative probabilistic model, which can leverage heterogeneous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity disambiguation. The results show that the proposed framework can solve Mongolian entity ambiguity.
What problem does this paper attempt to address?