Abstract:Generally, human brains can grasp intuitively the gist of thematic content of different texts through comprehensive reading, and such human-like generalization process may be accomplished with a more exact basis. With three representative text types in Chinese and English from two comparative corpora as our focus, that is, LCMC (the Lancaster Corpus of Mandarin Chinese) and Frown (the Freiburg-Brown Corpus of American English), this study compares thematic characteristics of these texts with PAM (Partition around Medoids) and HA (Hierarchical Agglomerative) clustering via three quantitative indicators, namely, TC (Thematic Concentration), STC (Secondary Thematic Concentration) and PTC (Proportional Thematic Concentration). The results show that: (1) eigenvectors standing for the thematic characteristic of three text types can be clustered into their corresponding categories in both Chinese and English; (2) two contributing factors are identified for the clustering results. One is the differences of TC, STC and PTC values of three text types lying in different hierarchical levels; the other is the differences of the percentages of 'thematic words', especially nouns at the pre-h-point and pre-2 h-point domain in three text types. The characterization of three text types as thematic-intensive (Official Document), thematic-balanced (News) and thematic-dispersive (Fiction) bears a cross-linguistic similarity in both Chinese and English.

Using Syntactic Network Characteristics to Do Text Clustering

Comparison study of using semantic and syntactic network characteristics to do text clustering

Central Nodes of the Chinese Syntactic Networks

The Complexity of Chinese Syntactic Dependency Networks

Language clusters based on linguistic complex networks

Classifying Syntactic Categories in the Chinese Dependency Network.

Language Clustering with Word Co-Occurrence Networks Based on Parallel Texts

Application of Quantitative Characteristics of Chinese Genres in Text Clustering

Can syntactic networks indicate morphological complexity of a language?

Word Web Cluster on Sparse Data of Social Network Based on Thematic Tree

Statistical Properties of Chinese Semantic Networks

Chinese Syntactic and Typological Properties Based on Dependency Syntactic Treebanks

Improving Dependency Parsing on Clinical Text with Syntactic Clusters from Web Text.

Using A Chinese Treebank to Measure Dependency Distance

Interrelations Among Dependency Tree Widths, Heights And Sentence Lengths

Semantic Correlation Network Based Text Clustering

Thematic Concentration As a Discriminating Feature of Text Types

How Do Local Syntactic Structures Influence Global Properties in Language Networks?

Research on Neural Network Clustering Algorithm for Short Text

Syntactic Complexity of Different Text Types: from the Perspective of Dependency Distance Both Linearly and Hierarchically

Valence patterns of parts of speech in chinese language networks