Abstract:Many approaches to automatic classification begin with some prescribed features. However, the features for Chinese aspect classification are normally prescribed as several integrated linguistic feature sets involving temporal, lexical aspectual or grammatical features. The number of the features is often gradually expanded as the designers try to refine the conditions for classification until at last the features should be optimized to eliminate some of the useless or contradictory features. The features for Chinese aspect classification are difficult to be optimized as they are discrete, quite different from those in other classifications. A model-based approach is proposed in this study to optimize the features for Chinese aspect classification illustrated by ZHE aspect markers by estimating, processing and testing the correlations between the features. As an important preparation for building the model, dummy variables are firstly adopted in this study to represent the discrete Chinese ZHE aspect features. The correlations among the features are then estimated by contingency tables. The highly correlated variables are further combined using the Principal Component Analysis. The performances of the original and the optimized features are finally empirically verified by logistic models. The optimized 26 feature sets from the original 40 feature sets are tested with better performances after comparisons before and after the optimizations. Model-based feature selection approaches extensively used in economics have rarely been applied in NLP for Chinese up until now. It will shed some new light on the NLP feature selection method and have some implications in generating rules for revising the Chinese ZHE aspects to its target English categories before being automatically translated into English categories.

Analysis On Chinese Quantitative Stylistic Features Based On Text Mining

A Study on Chinese Quantitative Stylistic Features and Relation among Different Styles Based on Text Clustering.

Discrimination of Chinese quantitative style features based on text clustering

A Quantitative Approach to the Stylistic Assessment of the Middle Chinese Texts

Application of Quantitative Characteristics of Chinese Genres in Text Clustering

Quantitative Stylistic Analysis of Middle Chinese Texts Based on the Dissimilarity of Evolutive Core Word Usage

A Comprehensive Analysis of Text Value and Linguistic Characteristics of Chinese Language Literature Based on Text Mining Technology

Mining Stylistic Features of Rhythm and Tempo Based on Text Clustering

Feature-Opinion Pairs Discovery for Chinese Review

Seeing Various Adventures Through a Mirror: Detecting Translator's Stylistic Visibility in Chinese Translations of Alice's Adventure in Wonderland

Corpus-based Quantitative Analysis on Stylistic Difference of Chinese Synonyms

Word Class,Syntactic Function and Style: A Comparative Study Based on Annotated Corpora

Quantitative Research on Grammatical Characteristics of Noun in Contemporary Chinese

Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features

Quantitative Typological Analysis of Romance Languages

Typological Features of Zhuang from the Perspective of Word Frequency Distribution.

Financial data analysis application via multi-strategy text processing

on Chinese Orientation Analysis

A Model-based Feature Optimization Approach to Chinese Language Processing.

Survey on Sentiment Orientation Analysis of Texts

Sentiment Classification for Chinese Reviews Based on Key Substring Features