Using Syntactic Network Characteristics to Do Text Clustering

陈芯莹,刘海涛
DOI: https://doi.org/10.3778/j.issn.1002-8331.1208-0490
2013-01-01
Computer Engineering and Applications Journal
Abstract:This paper builds six dependence syntactic networks based on six treebanks of different styles and gives a comparative analysis of overall characteristics of the networks, including the number of edges, the number of the nodes, the average degree, the clustering coefficient, the average path length, the centralization, the diameter, and the index of power-law, coefficient of determination. After that, the paper uses the Euclidean“the shortest distance”method, with characteristics as variables, to do clustering analysis of these networks. The results show that using some main parameters of networks, namely the number of the nodes, the clustering coefficient, the average path length, the centralization and the index of power-law, can do cluster analysis on texts. Compared with the traditional text clustering, the results are easier to explain in linguistic angle.
What problem does this paper attempt to address?