FF-Based Feature Selection for Improved Classification of Medical Data

YAN WANG,LIZHUANG MA
2009-01-01
Abstract:In processing the medical data, choosing the optimal subset of features is important, not only to reduce the processing cost but also to improve the classification performance of the model built from the selected data. Rough Set method has been recognized to be one of the powerful tools in the medical feature selection. However, the high storage space and the time-consuming computation restrict its application. In this paper, we propose two new concepts: discernibility string and feature forest, and an efficient algorithm, the Feature Forest Based (FF-Based) algorithm, for generation of all reducts of a medical dataset. The algorithm consists of two phases: feature forest construction phase and disjunctive normal form computation phase. In the first phase, the discernibility strings that are the concatenation of some of features between two different cases construct the feature forest. In the second phase, the disjunctive normal form is computed to reduct features based on feature forest. The experimental results on the medical datasets of UCI machine learning repository and a real liver cirrhosis dataset show that the algorithms of this paper can efficiently reduce storage cost and improve the classification performance.
What problem does this paper attempt to address?