Influence of Part-of-Speech on Chinese and English Document Clustering

HAN Pu,WANG Dongbo,LIU Yanyun,SU Xinning
DOI: https://doi.org/10.3969/j.issn.1003-0077.2013.02.010
2013-01-01
Abstract:Different part-of-speeches have different roles in document clustering.Using 4 popular English and Chinese datasets,the paper choose three clustering algorithms to investigate the influence of 4 major part-of-speeches as well as their combination on Chinese and English document clustering.The experimental result reveals that nouns are the most important in presenting the content of the document.Besides,verbs,adjectives and adverbs contribute to document clustering.Although similar result is obtained from the experiments,nouns.Using only nouns to characterize the document can not produce the best clustering result,but it can reduce the document dimensions to a great extent.The combination of 4 part-of-speeches produces the best clustering result.Single part-of-speech vary considerably in Chinese and English document clustering performance,and the differences are more consistent in Chinese document clustering.
What problem does this paper attempt to address?