Study on Web Text Feature Selection Based on Rough Set.

Xianghua Lu,Weijing Wang
DOI: https://doi.org/10.1007/978-3-642-31588-6_6
2012-01-01
Abstract:This paper uses vector space model as the description of the Web text, analyses the feature of the Web pages which are written in HTML, and improves the traditional formula of TF-IDF. The feature weight is calculated according to the term location in the document. In addition, a text classification system based on Vector Space Model is studied. In the article, feature selection and text classification is connected and feature terms are selected depending on the term's importance to classification, and then the paper proposes a feature selection algorithm based on rough set. Experiments show that this method can effectively improve the classification accuracy. It can not only reduce the dimension of feature space, but also improve the accuracy of classification. © 2012 Springer-Verlag.
What problem does this paper attempt to address?