Prediction of nitrated tyrosine residues in protein sequences by extreme learning machine and feature selection methods.

Lei Chen,ShaoPeng Wang,Yu-Hang Zhang,Lai Wei,XianLing Xu,Tao Huang,Yu-Dong Cai
DOI: https://doi.org/10.2174/1386207321666180531091619
2018-01-01
Combinatorial Chemistry & High Throughput Screening
Abstract:Background: Accurately recognizing nitrated tyrosine residues from protein sequences would pave a way for understanding the mechanism of nitration and the screening of the tyrosine residues in sequences. Results: In this study, we proposed a prediction model that used the extreme learning machine (ELM) algorithm as the prediction engine to identify nitrated tyrosine residues. To encode each tyrosine residue, a sliding window technique was adopted to extract a peptide segment for each tyrosine residue, from which a number of features were extracted. These features were analyzed by a popular feature selection method, Minimum Redundancy Maximum Relevance (mRMR) method, producing a feature list, in which all features were ranked in a rigorous way. Then, the Incremental Feature Selection (IFS) method was utilized to discover the optimal features, on which the optimal ELM-based prediction model was built. This model produced satisfactory results on the training dataset with a Matthews correlation coefficient of 0.757. The model was also evaluated by an independent test dataset that contained only positive samples, yielding a sensitivity of 0.938. Conclusion: Compared to other prediction models that use classic machine learning algorithms as prediction engines on the same datasets with their own optimal features, the optimal ELM-based prediction model produced much better results, indicating the superiority of the proposed model for the identification of nitrated tyrosine residues from protein sequences.
What problem does this paper attempt to address?