Learning Chinese-Japanese Bilingual Word Embedding by Using Common Characters.

Wang Jilei,Luo Shiying,Li Yanning,Xia Shu-Tao
DOI: https://doi.org/10.1007/978-3-319-47650-6_7
2016-01-01
Abstract:Bilingual word embedding, which maps word embedding of two languages into one vector space, has been widely applied in the domain of machine translation, word sense disambiguation and so on. However, no model has been universally accepted for learning bilingual word embedding. In this work, we propose a novel model named CJ-BOC to learn Chinese-Japanese word embeddings. Given Chinese and Japanese share a large portion of common characters, we exploit them in our training process. We demonstrated the effectiveness of such exploitation through theoretical and also experimental study. To evaluate the performance of CJ-BOC, we conducted a comprehensive experiment, which reveals its speed advantage, and high quality of acquired word embeddings as well.
What problem does this paper attempt to address?