Learning Distributed Representations Of Uyghur Words And Morphemes

Halidanmu Abudukelimu,Yang Liu,Xinxiong Chen,Maosong Sun,Abudoukelimu Abulizi
DOI: https://doi.org/10.1007/978-3-319-25816-4_17
2015-01-01
Abstract:While distributed representations have proven to be very successful in a variety of NLP tasks, learning distributed representations for agglutinative languages such as Uyghur still faces a major challenge: most words are composed of many morphemes and occur only once on the training data. To address the data sparsity problem, we propose an approach to learn distributed representations of Uyghur words and morphemes from unlabeled data. The central idea is to treat morphemes rather than words as the basic unit of representation learning. We annotate a Uyghur word similarity dataset and show that our approach achieves significant improvements over CBOW, a state-of-the-art model for computing vector representations of words.
What problem does this paper attempt to address?