Gene Selection Using Random Forest and Proximity Differences Criterion on DNA Microarray Data

Qifeng Zhou,Wencai Hong,Linkai Luo,Fan Yang
DOI: https://doi.org/10.4156/jcit.vol5.issue6.17
2010-01-01
Journal of Convergence Information Technology
Abstract:Selection of relevant genes for sample classification is a common task in most gene expression studies. As a powerful classification approach, random forest has been applied in this field, and it shows excellent performance compared with other classification methods. The measure of variable importance is the key of gene selection using random forest. However, the existing methods just consider the original variable importance measure based on the OOB error. In this paper, we proposed a new variable importance measure based on the difference of proximity matrix, and used it for gene selection on DNA microarray data. Compared with the existing variable importance analysis of random forest, the new method is more sensitive to information gene and yields small sets of genes while preserving predictive accuracy.
What problem does this paper attempt to address?