Research on Web Document Clustering Based on Sentential Maximum Frequent Word Sets

LU Song-Feng,CHEN Yun-Kai,YUAN Li
DOI: https://doi.org/10.3969/j.issn.1002-137X.2007.07.041
2007-01-01
Computer Science
Abstract:Web document clustering is an important research direction in Web mining area. Frequent pattern acquired form existing mining algorithms not only hashigh dimension, but can’t reflects semantic information expressed form document well. For gaining more precise clustering result, this paper presents a mining algorithm based on sentential maximum frequent words set to mine document characteristic items. Based on then, documents are clustered elementarily at first. Then classes are incorporated or separated according to distance between classes and join intension in class. At the end, documents clustering is achieved. Variable precision rough set model is used to compute eigenvector of each class. The experiment results indicate the algorithm presented in this paper is better than traditional document clustering algorithms.
What problem does this paper attempt to address?