Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features

Mingjie Tian,Yahui Zhao,Rongyi Cui
DOI: https://doi.org/10.1007/978-3-030-01716-3_7
2018-01-01
Abstract:Aiming at the increasingly rich multi language information resources and multi-label data in scientific literature, in order to mining the relevance and correlation in languages, this paper proposed the labeled bilingual topic model and co-occurrence feature based similarity metric which could be adopted to the word translation identifying task. First of all, it could assume that the keywords in the scientific literature are relevant to the abstract in the same article, then extracted the keywords and regard it as labels, labels with topics are assigned and the “latent” topic was instantiated. Secondly, the abstracts in article were trained by the labeled bilingual topic model and got the word representation on the topic distribution. Finally, the most similar word between both languages was matched with similarity metric proposed in this paper. The experiment result shows that the labeled bilingual topic model reaches better precision than “latent” topic model based bilingual model, and co-occurrence features enhance the attractiveness of the bilingual word pairs to improve the identifying effects.
What problem does this paper attempt to address?