Training Code-Switching Language Model with Monolingual Data.

Shun-Po Chuang,Tzu-Wei Sung,Hung-yi Lee
DOI: https://doi.org/10.1109/icassp40776.2020.9053775
2020-01-01
Abstract:Lack of code-switching data is an issue of training codeswitching language model. In this paper, we propose an approach to train code-switching language models with monolingual data only. By constraining and normalizing output projection matrix in RNN based language model, we make the embeddings of different languages close to each other. With the numerical and visualized results, we show that the proposed approaches remarkably improve the code-switching language modeling trained from monolingual data. The proposed approaches are comparable or even better than training code-switching language model with artificially generated code-switching data. Furthermore, we use unsupervised bilingual word translation to analyze if semantically equivalent words in different languages are mapped together.
What problem does this paper attempt to address?