Combining GCN and Bi-LSTM for Protein Secondary Structure Prediction

Hailong Jin,Wei Du,Jiawei Gu,Tianhao Zhang,Xiaohu Shi
DOI: https://doi.org/10.1109/bibm52615.2021.9669366
2021-01-01
Abstract:Protein secondary structure prediction is still a challenging task in bioinformatics, especially for 8-state (Q8) classification. To address this problem, we have proposed a deep learning based model by integrating graph convolutional network(GCN) and bidirectional long short-term memory (Bi-LSTM) network in this paper. In the model, GCN is utilized to synthesize the information of amino acids and their interactions, while Bi-LSTM has strong ability to capture the long-range dependencies of amino acids. For sequence representation, a new protein embedding derived by ProtTrans is used instead of the traditional amino acid one-hot encoding, together with evolutionary features of PSSM and HHM profiles. Amino acid contact potential derived from SPOTContact-Helical is used to construct amino acid graph. To verify the effectiveness of our proposed model, it is applied to several benchmark datasets, and obtained 78.05%, 76.81% 72.84%, 74.46% and 76.04% Q8 accuracy on CASP10, CASP11, CASP12, CB513 and TS115 datasets, respectively. Compared with 8 state-of-the-art competitions, our model obtained the best performance in most of datasets.
What problem does this paper attempt to address?