Semantic Classification Method for Network Tibetan Corpus

Gui-Xian Xu,Chang-Zhi Wang,Li-Hui Wang,Yu-Hong Zhou,Wei-Kang Li,Hao Xu,Qing Huang
DOI: https://doi.org/10.1007/s10586-017-0742-6
2017-01-01
Cluster Computing
Abstract:Tibetan web pages appear enormously. It is meaningful that the information processing technology is utilized to find the useful knowledge from the Tibetan web information. Tibetan semantic ontology can enrich the Tibetan digital resource and is helpful to improve the information processing performance. In this paper, semantic classification of Tibetan network corpus is studied. Firstly Tibetan web pages are collected. Secondly preprocessing is conducted to extract the useful information from Web pages. Thirdly the word segmentation and text representation are introduced. Finally the text similarity classification algorithm is proposed to classify the text. During the experiment, the comparison between semantic classification and non semantic classification is conducted. The results show that the semantic classification performance is obviously superior to non semantic classification. This means that making full use of ontology semantic relationship can greatly enhance the classification accuracy. The research is useful and helpful to the study of Tibetan semantic information processing.
What problem does this paper attempt to address?