Comparison study of using semantic and syntactic network characteristics to do text clustering

Xinying CHEN,Haitao LIU
DOI: https://doi.org/10.3778/j.issn.1002-8331.1307-0121
2014-01-01
Abstract:The study builds six dependence syntactic networks and semantic networks based on syntactic and semantic treebanks of different genres and does a comparative analysis of overall features of the networks, including the number of edges, the number of the nodes, the average degree, the clustering coefficient, the average path length, the centraliza-tion, the diameter, the index of power-law, and the coefficient of determination. The article tries multi-methods, with fea-tures as variables, to do clustering analysis of these networks. The results show that, although the syntactic and semantic networks all follow the linguistic principles, there are obvious differences between syntax and semantic networks. The meanings of the network parameters vary and the clustering results according to the parameters are different. Using the combinations of main semantic network parameters can obtain relatively reasonable clustering results, but it cannot distin-guish well written style from colloquialism while using the combinations of main syntactic network parameters can well distinguish different styles of texts and obtain reasonable text clustering results.
What problem does this paper attempt to address?