Abstract:One of the most widely-studied sub-problems of opinion mining is sentiment classification, which classifies evaluative texts as positive or negative to help people automatically identify the viewpoints underlying the online user-generated information. Most of the existing methods for sentiment classification ignore word sequence and unlabeled test documents' structural information. This paper proposes a transductive learning based algorithm considering both of these two types of information. The proposed algorithm is implemented by firstly selecting key substrings in the suffix tree constructed from the strings in training and unlabeled test documents and then converting each original text document to a bag of numbers of the key substrings. Finally, SVM is employed to classify the converted documents. Experiments on the open dataset (16,000 Chinese reviews) demonstrate promising performance of the proposed algorithm, the accuracy being over 93.15%, which is much better than the performance of the existing sentiment classification methods, such as n-gram features based classification algorithms. Experimental results also show that "tfidf-c" performs much better than other term weighting approaches in sentiment classification for large text corpus. In particular, the reasons behind the proposed algorithm's outstanding performance are further studied and analyzed in this paper. Moreover, the proposed algorithm can avoid the messy and rather artificial problem of defining word boundaries in Chinese language.

Chinese Comparative Sentence Identification Based on the Combination of Rules and Statistics

Learning to Identify Comparative Sentences in Chinese Text

Chinese Sentence Similarity Based on Multi-feature Combination

Sentiment Classification for Chinese Reviews: a Comparison Between SVM and Semantic Approaches

Comparative Analysis of Language Models for Linguistic Examination of Ancient Chinese Classics: A Case Study of Zuozhuan Corpus.

Chinese Sentence Similarity Measure Based on Word Sequence Length and Word Weight

A Comparative Study of Cross-Lingual Sentiment Classification

A review for comparative text mining: From data acquisition to practical application

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

Chinese Word Similarity Computing Based on Combination Strategy

A Parallel Two-Channel Emotion Classification Method for Chinese Text

A Comparative Study on Chinese Word Segmentation Using Statistical Models

Detecting Syntactic Features of Translated Chinese

Chinese Sentences Similarity via Cross-Attention Based Siamese Network

Sentiment Classification for Chinese Reviews Based on Key Substring Features

Comparative Graph-based Summarization of Scientific Papers Guided by Comparative Citations

Sentence Similarity Computation in Question Answering Robot

CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings

CHINESE-ENGLISH MACHINE TRANSLATION DISAMBIGUATING WITH RULE-BASED METHOD COMBINED WITH STATISTIC-BASED METHOD

Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model

SSMT:A Machine Translation Evaluation View to Paragraph-to-Sentence Semantic Similarity