Semi-Supervised Learning for Web Text Clustering.

Bingru Yang,Wei Song,Zhangyan Xu
2006-01-01
Abstract:Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semi-supervised framework is a promising approach to reduce the need for labeled training documents. In this paper, a semi-supervised learning method combining rough set and self-organizing maps (SOM) for Web text clustering is proposed. Rough set is used for reducing the irrelevant attributes of text representation on small set of labeled documents. And then using the set of reduced attributes got by rough set method, the SOM is employed for generating Web text clusters. Experimental results show the advantages of our approach to certain extent.
What problem does this paper attempt to address?