A Study on Chinese Quantitative Stylistic Features and Relation among Different Styles Based on Text Clustering.

Renkui Hou,Jiang Yang,Minghu Jiang
DOI: https://doi.org/10.1080/09296174.2014.911508
2014-01-01
Journal of Quantitative Linguistics
Abstract:The corpora for this study are from News Co-broadcasting, Daily Conversations and Behind the headlines with Wentao, each of which represents the formal written style, the colloquial style and the conversational style respectively. Sentence length, word length, part of speech (POS) and sentence-initial word POS are selected from the pre-processed corpora as features to generate text vectors and then clustered with PAM (partition around medoids) and Ward algorithms. The clustering results show: (1) It is reasonable to select sentence length, word length, POS and sentence-initial word POS as Chinese quantitative stylistic features. (2) Style is a polarized continuum, as the formal written style and the colloquial style display bipolar distributions while the conversational style lies in between and is near the pole of the colloquial style.
What problem does this paper attempt to address?