An Effective Rough Set-Based Method For Text Classification

Yg Bao,D Asai,Xy Du,K Yamada,N Ishii
DOI: https://doi.org/10.1007/978-3-540-45080-1_75
2003-01-01
Abstract:A central problem in good text classification for IF/IR is the high dimensionality of the data. To cope with this problem, we propose a technique using Rough Sets theory to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. Besides, we generate several reduct bases for the classification of new object, hoping that the combination of answers of the multiple reduct bases result in better performance. To get the tidy and effective rules, we use the value reduction as the final rules. This paper describes the proposed technique and provides experimental results.
What problem does this paper attempt to address?