TPOS Tagging Method Based on BiLSTM_CRF Model

Lili Wang,Ziyan Chen,Hongwu Yang
DOI: https://doi.org/10.1007/978-981-15-0118-0_38
2019-01-01
Abstract:Part of speech (POS) tagging determines the attributes of each word, and it is the fundamental work in machine translation, speech recognition, information retrieval and other fields. For Tibetan part-of-speech (TPOS) tagging, a tagging method is proposed based on bidirectional long short-term memory with conditional random field model (BiLSTM_CRF). Firstly, the designed TOPS tagging set and manual tagging corpus were used to get word vectors by embedding Tibetan words and corresponding TPOS tags in continuous bag-of-words (CBOW) model. Secondly, the word vectors were input into the BiLSTM_CRF model. To obtain the predictive score matrix, this model using the past input features and future input feature information respectively learned by forward long short-term memory (LSTM) and backward LSTM performs non-linear operations on the softmax layer. The prediction score matrix was input into the CRF model to judge the threshold value and calculate the sequence score error. Lastly, a Tibetan part of speech tagging model was got based on the BiLSTM_CRF model. The experimental results indicate that the accuracy of TPOS tagging model based on the BiLSTM_CRF model can reach 92.7%.
What problem does this paper attempt to address?