Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora.

Chunyue Zhang,Tiejun Zhao
DOI: https://doi.org/10.1007/978-3-319-26535-3_60
2015-01-01
Abstract:Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, the pairs of bilingual word embedding for training this transformation are assumed to satisfy a linear relationship automatically which actually can't be guaranteed absolutely in practice. This paper proposes a simple solution based on canonical correlation analysis CCA which forces the bilingual word embedding for training the transformation to be maximally linearly correlated onto the projection subspaces. After projecting the original word embedding into the new correlation subspace in two languages, a better transformation matrix is again learned with the new projected word embeddings as before. The experimental results confirm that the proposed solution can achieve a significant improvement of 62ï¾ź% in the precision at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.
What problem does this paper attempt to address?