Abstract:The morphological synergetic model has yet to be fully tested in typical analytic languages. The quantification of Chinese morphology and its relationship with word frequency can help construct and test the morphological synergetic model in Chinese. Based on the Lancaster Corpus of Mandarin Chinese, this study proposes a quantitative method for the structural complexity of Chinese words by Kolmogorov complexity, further examining the interrelation between the structural complexity of words (SCW) and word frequency. Results show that the SCW of words formed by combining morphemes in multiple assembling ways is generally higher than that in a single assembling way among the seven structural types of Chinese words, but derivational affixes impact SCW significantly. The higher SCW, the lower the word frequency. Given the combined effects of morpheme properties, y=Ax(-b)e(-cx) is more suitable to describe the inverse relationship than y=Ax(-b). Additionally, the higher the word frequency, the lower SCW. The delayed negative feedback causes small-scale fluctuations, but y=Ax(-b)e(-cx) can effectively describe the overall interactions between the two. From the internal mechanism, word frequency changes first, thus causing changes in word structure; In turn, for communication effectiveness, the structure of words becomes more complex to carry more meaning, thus influencing word frequency.

Word frequency approximation for chinese without using manually-annotated corpus

Word frequency approximation for chinese using raw, MM-Segmented and manually segmented corpora

Chinese Word Frequency Approximation Based on Multitype Corpora.

Chinese Word Segmentation Method Based on Dictionary and Frequency of the Words

Chinese Word Segmentation Without Using Lexicon and Hand-Crafted Training Data

Word extraction based on semantic constraints in chinese word-formation

Automatic Construction of Chinese Stop Word List

The Structural Complexity of Chinese Words and Its Relationship with Word Frequency.

Automatic keyphrase extraction from chinese news documents

Unsupervised segmentation of chinese corpus using accessor variety

Chinese Word Segmentation without Using Dictionary Based on Unsupervised Learning Strategy

Exploring Multiple Features for POS Guessing of Chinese Unknown Words with Maximum Entropy Models

Joint n-gram Chinese language modeling with an application to Chinese word segmentation

Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Stop word list construction and application in Chinese language processing

A Pragmatic Approach for Classical Chinese Word Segmentation.

A Comparative Study on Chinese Word Clustering

A Comparative Study on Chinese Word Segmentation Using Statistical Models

Collocation Extraction Using Monolingual Word Alignment Method.

A morphology-based Chinese word segmentation method

Recognize Foreign Low-Frequency Words with Similar Pairs