A rough set-based CBR approach for feature and document reduction in text categorization

Yan Li,S. Shiu,S. Pal,James N. K. Liu
DOI: https://doi.org/10.1109/ICMLC.2004.1382212
2004-08-26
Abstract:An approach of rough set-based case-based reasoning (CBR) approach is proposed to tackle the task of text categorization (TC). The initial work of integrating both feature and document reduction/selection in TC using rough sets and CBR properties is presented. Rough set theory is incorporated to reduce the number of feature terms through generating reducts. On the other hand, two concepts of case coverage and case reachability in CBR are used in selecting the representative documents. The main contribution of this paper is that both the number of features and the documents are reduced with minimal loss of useful information. Some experiments are conducted on the text datasets of Reuters21578. The experimental results show that, although the number of feature terms and documents are reduced greatly, the problem-solving quality in terms of classification accuracy is still preserved.
What problem does this paper attempt to address?