Abstract:Abstract Many approaches to automatic classification begin with some prescribed features. However, the features for Chinese aspect classification are normally prescribed as several integrated linguistic feature sets involving temporal, lexical aspectual or grammatical features. The number of the features is often gradually expanded as the designers try to refine the conditions for classification until at last the features should be optimized to eliminate some of the useless or contradictory features. The features for Chinese aspect classification are difficult to be optimized as they are discrete, quite different from those in other classifications. A model-based approach is proposed in this study to optimize the features for Chinese aspect classification illustrated by ZHE aspect markers by estimating, processing and testing the correlations between the features. As an important preparation for building the model, dummy variables are firstly adopted in this study to represent the discrete Chinese ZHE aspect features. The correlations among the features are then estimated by contingency tables. The highly correlated variables are further combined using the Principal Component Analysis. The performances of the original and the optimized features are finally empirically verified by logistic models. The optimized 26 feature sets from the original 40 feature sets are tested with better performances after comparisons before and after the optimizations. Model-based feature selection approaches extensively used in economics have rarely been applied in NLP for Chinese up until now. It will shed some new light on the NLP feature selection method and have some implications in generating rules for revising the Chinese ZHE aspects to its target English categories before being automatically translated into English categories.

Lexicon Optimization for Chinese Language Modeling

A CRF-based Method for Automatic Construction of Chinese Symptom Lexicon

Error feedback based lexical entity extraction for Chinese language modeling

Chinese Lexical Simplification

Simplify the Usage of Lexicon in Chinese NER

A Local Information Perception Enhancement–Based Method for Chinese NER

A three level cache-based adaptive chinese language model

Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

Improving Language Model Size Reduction Using Better Pruning Criteria

A Model-based Feature Optimization Approach to Chinese Language Processing

A Model-based Feature Optimization Approach to Chinese Language Processing.

Joint n-gram Chinese language modeling with an application to Chinese word segmentation

Lexicon Modeling for Query Understanding

Discriminative Pruning of Language Models for Chinese Word Segmentation

Multilingual Lexical Simplification via Paraphrase Generation

A Word Language Model Based Contextual Language Processing On Chinese Character Recognition

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

Revisit Word Embeddings with Semantic Lexicons for Modeling Lexical Contrast

CNN-Based Chinese NER with Lexicon Rethinking

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models