Abstract:Text classification involves annotating text data with specific labels and is a crucial research task in the field of natural language processing. Chinese text classification presents significant challenges due to the complex semantics of the language, difficulties in semantic feature extraction, and the interleaving and irregularity of lexical features. Traditional methods often struggle to manage the relationships between words and sentences in Chinese, hindering the model's ability to capture deep semantic information and resulting in poor classification performance. To address these issues, a Chinese text classification method based on utterance information enhancement and feature fusion is proposed. This method first embeds the text into a unified space and obtains feature representations of word vectors and sentence vectors using the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model. Subsequently, an utterance information enhancement module is constructed to perform syntactic enhancement and feature extraction on the sentence information within the text. Additionally, a feature fusion strategy is introduced to combine the enhanced sentence-level information features with the word-level features extracted by the Bi-GRU (Bidirectional Gated Recurrent Unit network), culminating in the classification output. This approach effectively enhances the feature representation of Chinese text and significantly filters out irrelevant and noisy information. Evaluations on several Chinese datasets demonstrate that the proposed method surpasses existing mainstream classification models in terms of classification accuracy and F1 value, validating its effectiveness and feasibility.

Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity

Sentiment Classification for Chinese Reviews: a Comparison Between SVM and Semantic Approaches

A Concept Similarity Based Text Classification Algorithm

Improving Short Text Classification Through Better Feature Space Selection

Research on Chinese Semantic Similarity Algorithm

A Method for Chinese Text Classification Based on Three-Dimensional Vector Space Model

Chinese text classification method based on sentence information enhancement and feature fusion

Fast text categorization based on collaborative work in the semantic and class spaces

Combining Vector Space Features and Convolution Neural Network for Text Sentiment Analysis.

Text Classification Via Learning Semantic Dependency and Association

Research on Chinese Text Classification Based on WAE and SVM

Combining Lexical and Semantic Features for Short Text Classification.

Short Text Classification Based on Strong Feature Thesaurus

Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method

A Chinese Character-Level and Word-Level Complementary Text Classification Method.

Using Tongyici Cilin To Compute Word Semantic Polarity

Chinese text classification based on character-level CNN and SVM

Improving Text Classification Using Local Latent Semantic Indexing

A New Similarity Computing Method Based on Concept Similarity in Chinese Text Processing.

WordNet-based Concept Vector Space Model for Text Classification

Semantic-based Automatic Text Classification Method