Abstract:INTRODUCTION:In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data.RESULTS:We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data.CONCLUSION:Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study.

A Clustering Based Hybrid System for Mass Spectrometry Data Analysis

A Hybrid Feature Selection Algorithm and Its Application in Bioinformatics

MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data

Enhancing mass spectrometry data analysis: A novel framework for calibration, outlier detection, and classification

A Robust Hybrid Approach Based on Estimation of Distribution Algorithm and Support Vector Machine for Hunting Candidate Disease Genes.

Markov Random Fields and Mass Spectra Discrimination

Feature Extraction in the Analysis of Proteomic Mass Spectra

Intelligence Algorithms for Protein Classification by Mass Spectrometry

Discrimination Analysis of Mass Spectrometry Proteomics for Ovarian Cancer Detection

A Hybrid Gene Selection Method for Cancer Classification Based on Clustering Algorithm and Euclidean Distance

Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

A Novel Algorithm for Multi-class Cancer Diagnosis on MALDI-TOF Mass Spectra

Profiling Ms Proteomics Data Using Smoothed Non-Linear Energy Operator and Bayesian Additive Regression Trees

Preprocessing and Classifying of Mass Spectrometry-Based Proteomics Data Using Wavelet Transform and Decision Tree Learning

Peak Tree: a New Tool for Multiscale Hierarchical Representation and Peak Detection of Mass Spectrometry Data.

Comparison of Feature Selection and Classification for MALDI-MS Data

Proteomic Profile Analysis and Biomarker Discovery from Mass Spectra Using Independent Component Analysis Combined with Uncorrelated Linear Discriminant Analysis

An Analysis Model of Protein Mass Spectrometry Data and Its Application

Feature selection from proteomic mass spectrometric data using chemometric methods

Cancer Discrimination Based on Decision Trees and Mass Spectral Analysis Data

A Parallel Feature Selection Based on Rough Set Theory for Protein Mass Spectrometry Data