The Imbalanced Problem in Mass-Spectrometry Data Analysis

Hao-Hua Meng,Guo-Zheng Li,Rui-Sheng Wang,Xing-Ming Zhao,Luonan Chen
2008-01-01
Abstract:In many cases, protein mass-spectrometry data are imbalanced, i.e. the number of positive examples is much less than that of negative ones, which generally degrade the performance of classifiers used for protein recognition. Despite its importance, few works have been conducted to handle this problem. In this paper, we present a new method that utilizes the EasyEnsemble algorithm to cope with the imbalance problem in mass-spectrometry data. Furthermore, two feature selection algorithms, namely PREE (Prediction Risk based feature selection for EasyEnsemble) and PRIEE (Prediction Risk based feature selection for Individuals of EasyEnsemble), are proposed to select informative features and improve the performance of the EasyEnsemble classifier. Experimental results on three mass spectra data sets demonstrate that the proposed methods outperform two existing filter feature selection methods, which prove the effectiveness of the proposed methods.
What problem does this paper attempt to address?