Public opinion classification and text alignment based on Chinese and Tibetan corpus

Guixian Xu,Haishen Yao,Dongming Wu,Yuan Li,Deguang Ouyang,Gaofeng Chen
DOI: https://doi.org/10.1007/s10586-017-1267-8
2017-01-01
Cluster Computing
Abstract:To address the need for researching the security of Chinese and Tibetan networks, the classification of public opinion of Chinese and Tibetan texts is proposed. First, web pages are collected. Second, preprocessing is conducted to extract the useful information from web pages. Third, a table of the Chinese and Tibetan public opinion key words is built. Finally, text similarity calculation is proposed to classify the text according to the table of public opinion words. A Chinese–Tibetan text-level alignment approach that is based on Chinese and Tibetan translation dictionary is proposed to match word frequency and position. Furthermore, sentence-level alignment algorithm is studied. The alignment performance is related to the Chinese and Tibetan translation dictionary. Text classification of public opinion and Chinese–Tibetan text alignment system is developed. After public opinion classification of Chinese text, the alignment software can discover the most similar Tibetan text and present it to the user. This research can effectively contribute to identifying Chinese and Tibetan public opinion text and is meaningful for information retrieval, text clustering, and Chinese and Tibetan machine translation.
What problem does this paper attempt to address?