Classification for Imbalanced Microarray Data Based on Oversampling Technology and Random Forest

YU Hua-long,GAO Shang,ZHAO Jing,QIN Bin
DOI: https://doi.org/10.3969/j.issn.1002-137X.2012.05.045
2012-01-01
Computer Science
Abstract:In recent years,applying DNA microarray technology to diagnose for disease,especially for cancer,has been becoming one of hot topics in bioinformatics.In contrast with many other data carriers,microarray data generally holds some unique characteristics.A novel oversampling technology based on probability distribution was proposed to solve the problem brought by the characteristic of sample distribution imbalance of microarray data.By this technology,some reasonable pseudo samples would be created for the minority class to guarantee the balance between two classes.Then we used random forest to classify the samples belonging to different classes.Its effectiveness and feasibility were verified on two benchmark microarray datasets.Experimental results show that the proposed method can obtain better classification performance,compared with some traditional approaches.
What problem does this paper attempt to address?