Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness

Mengting Luo,Linchao He,Mingyue Guo,Fei Han,Long Tian,Haibo Pu,Dejun Zhang
DOI: https://doi.org/10.1088/1757-899x/533/1/012051
2019-01-01
IOP Conference Series Materials Science and Engineering
Abstract:Nearest neighbor search is playing a critical role in machine word translation, due to its ability to obtain the lingual labels of source word embeddings by searching k Nearest Neighbor ( k NN) target embeddings from a shared bilingual semantic space. However, aligning two language distributions into a shared space usually requires amounts of target label, and k NN retrieval causes hubness problem in high-dimensions feature space. Although most the best-k retrievals get rid of hubs in the list of translation candidates to mitigate the hubness problem, it is flawed to eliminate hubs. Because hub also has a correct source word query corresponding to it and should not be crudely excluded. In this paper, we introduce an unsupervised machine word translation model based on Generative Adversarial Nets (GANs) with Bilingual Similarity retrieval, namely, Unsupervised-BSMWT. Our model addresses three main challenges: (1) reduce the dependence of parallel data with GANs in a fully unsupervised way. (2) Significantly decrease the training time of adversarial game. (3) Propose a novel Bilingual Similarity retrieval for mitigating hubness pollution regardless of whether it is a hub. Our model efficiently performs competitive results in 74min exceeding previous GANs-based models.
What problem does this paper attempt to address?