Learning Multilingual Sentence Embeddings From Monolingual Corpus

Shuai Wang,Lei Hou,Juanzi Li,Meihan Tong,Jiabo Jiang
DOI: https://doi.org/10.1007/978-3-030-32381-3_28
2019-01-01
Abstract:Learning multi-lingual sentence embeddings usually requires large scale of parallel sentences which are difficult to obtain. We propose a novel self-learning approach which is capable of learning multi-lingual sentence embeddings from monolingual corpora. Our assumption is that, irrelevant to languages, sentences appearing in similar contexts are similar. Thus, we first train monolingual sentence embeddings of different languages with shared parameters as initialization. Then we iteratively extract similar sentence pairs and exchange their positions regardless of languages. Through their relations to their new contexts we predict the similarities between a similar sentence pair. Our experiments show that the proposed approach outperforms existing unsupervised approaches and is competitive to supervised approaches.
What problem does this paper attempt to address?