Bilingual Lexicon Extraction Using Locally Weighted Linear Regression from Comparable Corpora.

Chunyue Zhang,Tiejun Zhao
DOI: https://doi.org/10.1109/ialp.2015.7451520
2015-01-01
Abstract:Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, it is easy to underfit for transforming all the words just using a single transformation matrix. This paper proposes a simple non-parameter based solution using locally weighted linear regression (LWR) which forces that the closer words in the training lexicon with the target word should be more important for estimating the objective function for the regression. The experimental results confirm that the proposed solution can achieve a 36.7% relative improvement at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.
What problem does this paper attempt to address?