Chinese Word Segmentation of Ideological and Political Education Based on Unsupervised Learning

Yang Xinghai,Zang Wenjing,Meng Shun,Liu Jiafeng,Zhang Yulin
DOI: https://doi.org/10.1145/3358528.3358579
2019-01-01
Abstract:This paper proposes an unsupervised Chinese word segmentation algorithm for ideological and political education. The algorithm is divided into two parts: language model generation algorithm and the Viterbi algorithm. The language model generation algorithm calculates the conditional probability based on the big texts and determines the number of occurrences between single character and character. Then we can have a character-level N-gram language model. Viterbi algorithm uses the idea of dynamic programming. Viterbi algorithm can use character-level language model to find the optimal word segmentation path. Finally complete the task of Chinese word segmentation supported by big texts. Experiments show that the proposed algorithm has a good recognition rate for vocabulary in the field of ideological and political education. With the characteristics of unsupervised learning, the algorithm can save a lot of labor costs and meet the needs of word segmentation in the field of ideological and political education.
What problem does this paper attempt to address?