A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings

Liangchen Wei,Zhi-Hong Deng
DOI: https://doi.org/10.24963/ijcai.2017/582
2017-01-01
Abstract:Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word embeddings. The variational model introduces a continuous latent variable to explicitly model the underlying semantics of the parallel sentence pairs and to guide the generation of the sentence pairs. Our model restricts the bilingual word embeddings to represent words in exactly the same continuous vector space. Empirical results on the task of cross lingual document classification has shown that our method is effective.
What problem does this paper attempt to address?