Abstract:The rapid development of Tibetan information technology provides rich resources for Tibetan information processing technology. The construction of Tibetan corpus is the field of Tibetan information processing of basic work. In this paper, we design the system of Tibetan network data collection and web pages preprocessing. It can timely and efficiently access to web resources, and provide a basis for further analysis of Tibetan data. It can establish the Tibetan related corpus, enrich the Tibetan digital resources. It can also alleviate the status of Tibetan corpus data sparse and lack of resources and bring the convenient condition for Tibetan information processing. The hot words reflect the hot spot of Tibetan people’s attention in a certain period of time. Firstly, the paper proposes the method for reducing the space dimension of Tibetan news text. It can effectively reduce the complexity of subsequent processing. Secondly, term weighting method is proposed based on improved TFIDF for Tibetan text information extraction. It utilizes the idea that the words of different locations are given different weights to extract the hot words. On sensitive words discovery and classification of public opinion, sensitive thesaurus are collected artificially. Through the sensitive thesaurus comparison, the sensitive words are extracted. Classification of public opinion words is based on the proposed classification formula and the public opinion thesaurus. It will classify one Tibetan text to one public opinion class. In this paper, the software is developed to automatically collect Tibetan web pages from the network, preprocess the web pages, extract the text features and hot words, discover the sensitive words and classify the Tibetan text to one public opinion class. The experiment shows that the Tibetan hot words extraction is effective and Tibetan classification results of public opinion are significant.

Semantic Classification Method for Network Tibetan Corpus

The Technology Research of The Semantic Text Classification

Automatic Classification of Tibetan Web Pages

Automatic Text Classification of Tibetan Web Pages Based on Column

Research on Discovering Tibetan Public Sentiment Information from Network

Research on Tibetan Hot Words, Sensitive Words Tracking and Public Opinion Classification.

Measuring semantic nouns in Tibetan language

A Semantic Orientation Distinction Method for Opinion Mining in Tibetan Language

Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity

Public opinion classification and text alignment based on Chinese and Tibetan corpus

Tibetan Concept Similarity Computation Based on Ontology

Efficient Sensitive Information Classification and Topic Tracking Based on Tibetan Web Pages

Study on Classification Technology of Tibetan Text

Text Classification Based on Machine Learning for Tibetan Social Network

A Tibetan Text Classification Method Based on Hybrid Model and Channel Attention Mechanism

The Design and Implementation of Tibetan-Chinese Online Dictionary Based on Semantic Ontology

Machine translation technology based on Tibetan semantic parsing

Tibetan Sentence Sentiment Computing Based on Statistics

A semi-automatic method of ontology population and enrichment in Tibetan language from the WWW

Collection Of Tibetan Network

Tibetan-BERT-wwm: A Tibetan Pretrained Model with Whole Word Masking for Text Classification