Enhanced Double-Carrier Word Embedding Via Phonetics and Writing
Wenhao Zhu,Xin Jin,Shuang Liu,Zhiguo Lu,Wu Zhang,Ke Yan,Baogang Wei
DOI: https://doi.org/10.1145/3344920
IF: 1.471
2020-01-01
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:Word embeddings, which map words into a unified vector space, capture rich semantic information. From a linguistic point of view, words have two carriers, speech and writing. Yet the most recent word embedding models focus on only the writing carrier and ignore the role of the speech carrier in semantic expressions. However, in the development of language, speech appears before writing and plays an important role in the development of writing. For phonetic language systems, the written forms are secondary symbols of spoken ones. Based on this idea, we carried out our work and proposed double-carrier word embedding (DCWE). We used DCWE to conduct a simulation of the generation order of speech and writing. We trained written embedding based on phonetic embedding. The final word embedding fuses writing and phonetic embedding. To illustrate that our model can be applied to most languages, we selected Chinese, English, and Spanish as examples and evaluated these models through word similarity and text classification experiments.