Thematic Concentration As a Discriminating Feature of Text Types

Ruina Chen,Haitao Liu
DOI: https://doi.org/10.1080/09296174.2017.1339441
2017-01-01
Journal of Quantitative Linguistics
Abstract:Generally, human brains can grasp intuitively the gist of thematic content of different texts through comprehensive reading, and such human-like generalization process may be accomplished with a more exact basis. With three representative text types in Chinese and English from two comparative corpora as our focus, that is, LCMC (the Lancaster Corpus of Mandarin Chinese) and Frown (the Freiburg-Brown Corpus of American English), this study compares thematic characteristics of these texts with PAM (Partition around Medoids) and HA (Hierarchical Agglomerative) clustering via three quantitative indicators, namely, TC (Thematic Concentration), STC (Secondary Thematic Concentration) and PTC (Proportional Thematic Concentration). The results show that: (1) eigenvectors standing for the thematic characteristic of three text types can be clustered into their corresponding categories in both Chinese and English; (2) two contributing factors are identified for the clustering results. One is the differences of TC, STC and PTC values of three text types lying in different hierarchical levels; the other is the differences of the percentages of 'thematic words', especially nouns at the pre-h-point and pre-2 h-point domain in three text types. The characterization of three text types as thematic-intensive (Official Document), thematic-balanced (News) and thematic-dispersive (Fiction) bears a cross-linguistic similarity in both Chinese and English.
What problem does this paper attempt to address?