A Chinese Web Page Clustering Algorithm Based on the Suffix Tree

Yang Jian-wu
DOI: https://doi.org/10.1007/bf02831687
2004-01-01
Wuhan University Journal of Natural Sciences
Abstract:In this paper, an improved algorithm, named STC-I. is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page.
What problem does this paper attempt to address?