Abstract:Methionine is a proteinogenic amino acid that can be post-translationally modified. It is now well established that reactive oxygen species can oxidise methionine residues within living cells. For a long time, it has been thought that such a modification represents merely an inevitable damage derived from aerobic metabolism. However, several authors have begun to contemplate a possible role for this methionine modification in cell signalling. During the last years, a number of proteomic studies have been carried out with the purpose of detecting proteins containing oxidised methionines. Although these proteomic works allow to pinpoint those methionines being oxidised, they are also arduous, expensive and time-consuming. For these reasons, computational approaches aimed at predicting methionine oxidation sites in proteins become an appealing alternative. In the current work, we address methionine oxidation prediction by combining computational intelligence methods with feature engineering and feature selection techniques to improve the efficacy of several machine learning models, while reducing the number of input characteristics needed to get high accuracy rates. We compare random forests, support vector machines, neural networks and flexible discriminant analysis models. Random forests give the best AUC (0.8124±0.0334\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$0.8124 \pm 0.0334$$\end{document}) and accuracy rates (0.7590±0.0551\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$0.7590 \pm 0.0551$$\end{document}) by using only a reduced set of 16 characteristics. These results surpass the outcomes of previous works. In addition, we present an end-user script that has been developed to take a protein ID as an input and return a list with the oxidation state of all the methionine residues found in the analysed protein. Finally, to illustrate the applicability of this tool, we have selected the human α1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha 1$$\end{document}-antitrypsin protein as a case study. This protein was selected because it was not present among the set of proteins used to build up the predictive models but the protein has been well characterised experimentally in terms of methionine oxidation. The prediction returned by our script fully matches the empirical evidence. Out of the nine methionine residues found in this protein, our model predicts the oxidation of only two of them, M351 and M358, which have been reported, on the base of mass spectrometry analyses, to be particularly susceptible to oxidation.

Prediction of nitrated tyrosine residues in protein sequences by extreme learning machine and feature selection methods.

Prediction of Tyrosine Sulfation with mRMR Feature Selection and Analysis

Prediction of Nitration Sites Based on FCBF Method and Stacking Ensem-ble Model

Computational Analysis of Protein Tyrosine Nitration

DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning.

Prediction of Protein N-formylation and Comparison with N-acetylation Based on a Feature Selection Method

Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model

MTNA: A deep learning based predictor for identifying multiple types of N-terminal protein acetylated sites

Combining feature engineering and feature selection to improve the prediction of methionine oxidation sites in proteins

Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites

Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction

Prediction of Protein Amidation Sites by Feature Selection and Analysis

DTL-NeddSite: A Deep-Transfer Learning Architecture for Prediction of Lysine Neddylation Sites

Deepmal: Accurate Prediction Of Protein Malonylation Sites By Deep Neural Networks

Predicting protein oxidation sites with feature selection and analysis approach.

Prediction of Nepsilon-acetylation on internal lysines implemented in Bayesian Discriminant Method.

Prediction of Nucleophilicity and Electrophilicity Based on a Machine Learning Approach

EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction

Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration

Identification of the sequence determinants of protein N-terminal acetylation through a decision tree approach

Auto-encoder-extreme learning machine model for boiler NOx emission concentration prediction