Research progress of reduced amino acid alphabets in protein analysis and prediction

Yuchao Liang,Siqi Yang,Lei Zheng,Hao Wang,Jian Zhou,Shenghui Huang,Lei Yang,Yongchun Zuo
DOI: https://doi.org/10.1016/j.csbj.2022.07.001
IF: 6.155
2022-07-06
Computational and Structural Biotechnology Journal
Abstract:Highlights • A comprehensive summary of the literature on the reduced amino acid strategy published in recent years and the reduced amino acid alphabets. • A systematic review of the development history of reduced amino acid strategy. • Rich application cases of amino acid reduction strategies are described in the article. • A detailed analysis of the properties and uses of the reduced amino acid alphabets. Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation is crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.
biochemistry & molecular biology
What problem does this paper attempt to address?