Abstract:Chinese characters are generally correlated with their semantic meanings, and the structure of radicals, in particular, can be a clear indication of how characters are related to each other. In the Chinese characters simplification movement, some different traditional characters have been transferred into one simplified character (many-to-one mapping), resulting in the phenomenon of ’one simplified character corresponding to many traditional characters. Compared to the simplified characters, the traditional characters contain richer structural information, which is also more meaningful to semantic understanding. Traditional approaches of text modelling often overlook the structural content of Chinese characters and the role of human cognitive behaviour in the process of text comprehension. Hence, we propose a Chinese text classification model derived from the construction methods and evolution of Chinese characters. The model consists of two branches: the simplified and the traditional, with an attention module based on the radical classification in each branch. Specifically, we first develop a sequential modelling structure to obtain sequence information of Chinese texts. Afterwards, an associated word module using the part head as a medium is designed to filter out keywords with high semantic differentiation among the auxiliary units. An attention module is then implemented to balance the importance of each keyword in a particular context. Our proposed method is conducted on three datasets to demonstrate validity and plausibility.

Eliminating High-Degree Biased Character Bigrams for Dimensionality Reduction in Chinese Text Categorization

Raising High-Degree Overlapped Character Bigrams Into Trigrams For Dimensionality Reduction In Chinese Text Categorization

CLDA: Feature Selection for Text Categorization Based on Constrained LDA

Select Strong Information Features to Improve Text Categorization Effectiveness

Aggressive Dimensionality Reduction With Reinforcement Local Feature Selection For Text Categorization

A Study On Feature Weighting In Chinese Text Categorization

Learning Effective Features for Chinese Text Categorization

Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

Chinese Text Categorization Based On The Binary Weighting Model With Non-Binary Smoothing

Non-Independent Term Selection for Chinese Text Categorization

Distributional Character Clustering For Chinese Text Categorization

Non-Negative Sparse Semantic Coding for Text Categorization

Scalable Term Selection for Text Categorization.

Classifying Chinese Texts in Two Steps.

An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm.

A High Performance Two-Class Chinese Text Categorization Method

A Chinese text classification model based on radicals and character distinctions

N-grams based feature selection and text representation for Chinese Text Classification

Chinese Text Classification Using Key Characters String Kernel

Application Of The Character-Level Statistical Method In Text Categorization

Chinese Document Categorization without Dictionary Support and Segmentation Processing