Uyghur Morphological Segmentation with Bidirectional GRU Neural Networks

Halidanmu ABUDUKELIMU,Yong CHENG,Yang LIU,Maosong SUN
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2017.21.001
2017-01-01
Abstract:Information processing of low-resource, morphologicallyrich languages such as Uyghur is critical for addressing the language barrier problem faced by the One Belt and One Road (B&R) program in China. In such languages, individual words encode rich grammatical and semantic information by concatenating morphemes to a root form, which leads to severe data sparsity for language processing. This paper introduces an approach for Uyghur morphological segmentation which divides Uyghur words into sequences of morphemes based on bidirectional gated recurrent unit (GRU) neural networks. The bidirectional GRU exploits the bidirectional context to resolve ambiguities and model long-distance dependencies using the gating mechanism. Tests show that this approach significantly outperforms conditional random fields and unidirectional GRUs. This approach is language-independent and can be applied to all morphologically-rich languages.
What problem does this paper attempt to address?