Random forest for imbalanced microarray data classification by weighting strategy

Hualong Yu,Shang Gao,Bin Qin
2011-01-01
Abstract:Class imbalance problem, which occurs frequently in various real world classification tasks, severely hurts prediction performance of minority classes. For microarray datasets, class imbalance problem is ubiquitous, too. Meanwhile, the characteristics of high dimension and small samples of microarray datasets intensify this damage. Therefore, it is necessary to pay more attention to this problem in the field of microarray data classification. In this paper, we use random forest with weighting strategy to deal with this problem. Experimental results in wellknown colon dataset show that weighted random forest classifier may acquire better classification performance via various evaluation criteria such as BER (Balanced Error Rate), F-measure and G-mean, which indicates its effectiveness and feasibility. © 2011 ICIC International.
What problem does this paper attempt to address?