ANMM4CBR: a Case-Based Reasoning Method for Gene Expression Data Classification

Bangpeng Yao,Shao Li
DOI: https://doi.org/10.1186/1748-7188-5-14
2010-01-01
Algorithms for Molecular Biology
Abstract:BACKGROUND:Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms.METHOD:In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data.RESULTS:The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise.AVAILABILITY:The source code is attached as an additional file of this paper.
What problem does this paper attempt to address?