Bilingual lexicon induction from non-parallel corpora

Meng ZHANG,Yang LIU,Maosong SUN
DOI: https://doi.org/10.1360/n112017-00256
2018-01-01
Abstract:In cross-lingual natural language processing, the lack of parallel data is a serious problem. However, this is common in scenarios with scarce language resources. In this case, better utilizing translational equivalence encoded in non-parallel corpora becomes more important. Owing to the non-parallelism of the corpora, acquiring translational equivalence faces the challenging problem of small data or unsupervised learning, and the result usually takes the form of a bilingual lexicon. Not only is this an important research problem in the field of artificial intelligence, but it also has significant application value in scenarios with scarce language resources. This paper introduces a series of studies that address problems in prior research, exploring how to obtain better bilingual lexica with non-parallel corpora from various perspectives.
What problem does this paper attempt to address?