DeepLA: A deep learning-based model for predicting protein function from protein sequence and evolutionary information.

Bing Jia,Tao Feng,Chenri Li,Baoqi Huang,Fei Hao,Dongjun Liu
DOI: https://doi.org/10.1109/BIBM58861.2023.10385542
2023-01-01
Abstract:Proteins are among the most essential molecules in the living body and are irreplaceable in various biological processes that sustain life. Predicting their functions is essential for understanding the molecular mechanisms of cellular life activities. With the widespread use of second-generation highthroughput sequencing technologies, more and more protein sequence data are being rapidly sequenced and shared. The rapid accumulation and update of these data provide a more substantial and diverse data foundation for protein function prediction research. Traditisonal biological experimental methods are the most reliable way to determine protein function. However, relying solely on manual experimental methods to test unknown protein functions individually suffers from a high workload and long lead time. Therefore, a deep learning-based protein function prediction model called DeepLA was proposed in this paper. Firstly, the model encodes protein sequences in One-hot and uses the Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) algorithm to perform a comparative search to obtain a Position-Specific Scoring Matrix (PSSM) containing protein evolutionary information. Next, the protein feature vectors were fed into a multi-channel model consisting of a convolutional neural network, a bidirectional long- and short-term memory network, and a self-attentive mechanism for feature extraction to achieve the protein function prediction task. The results showed that DeepLA exhibited good performance in the Molecular Function(MF) and Biological Process (BP) categories with values of 0.556 and 0.488 on the publicly available dataset CAFA3, respectively, which were 5.8% and 8.0% higher than other models.
What problem does this paper attempt to address?