Abstract:Text classification involves annotating text data with specific labels and is a crucial research task in the field of natural language processing. Chinese text classification presents significant challenges due to the complex semantics of the language, difficulties in semantic feature extraction, and the interleaving and irregularity of lexical features. Traditional methods often struggle to manage the relationships between words and sentences in Chinese, hindering the model's ability to capture deep semantic information and resulting in poor classification performance. To address these issues, a Chinese text classification method based on utterance information enhancement and feature fusion is proposed. This method first embeds the text into a unified space and obtains feature representations of word vectors and sentence vectors using the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model. Subsequently, an utterance information enhancement module is constructed to perform syntactic enhancement and feature extraction on the sentence information within the text. Additionally, a feature fusion strategy is introduced to combine the enhanced sentence-level information features with the word-level features extracted by the Bi-GRU (Bidirectional Gated Recurrent Unit network), culminating in the classification output. This approach effectively enhances the feature representation of Chinese text and significantly filters out irrelevant and noisy information. Evaluations on several Chinese datasets demonstrate that the proposed method surpasses existing mainstream classification models in terms of classification accuracy and F1 value, validating its effectiveness and feasibility.

Long Text Classification Based on BERT

Chinese Text Classification Using BERT and Flat-Lattice Transformer.

A Long-Text Classification Method of Chinese News Based on BERT and CNN

Global Semantic Information Extraction Model for Chinese long text classification based on fine-tune BERT

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Long Text Classification with Segmentation

A text classification method based on a convolutional and bidirectional long short-term memory model

CogLTX: Applying BERT to Long Texts.

Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification

A Chinese Text Classification Method Based on BERT and Convolutional Neural Network

Research on Text Classification Based on BERT-BiGRU Model

Feature-enhanced text-inception model for Chinese long text classification

A Sentence-level Hierarchical BERT Model for Document Classification with Limited Labelled Data

Chinese text classification method based on sentence information enhancement and feature fusion

A Multi-feature Fusion Method with Attention Mechanism for Long Text Classification

SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus

A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU

Hierarchical and lateral multiple timescales gated recurrent units with pre-trained encoder for long text classification

Feature-Enhanced Nonequilibrium Bidirectional Long Short-Term Memory Model for Chinese Text Classification