Integrating unsupervised language model with multi-view multiple sequence alignments for high-accuracy inter-chain contact prediction

Zi Liu,Yi-Heng Zhu,Long-Chen Shen,Xuan Xiao,Wang-Ren Qiu,Dong-Jun Yu
DOI: https://doi.org/10.1016/j.compbiomed.2023.107529
IF: 7.7
2023-09-25
Computers in Biology and Medicine
Abstract:Accurate identification of inter-chain contacts in the protein complex is critical to determine the corresponding 3D structures and understand the biological functions. We proposed a new deep learning method, ICCPred, to deduce the inter-chain contacts from the amino acid sequences of the protein complex. This pipeline was built on the designed deep residual network architecture, integrating the pre-trained language model with three multiple sequence alignments (MSAs) from different biological views. Experimental results on 709 non-redundant benchmarking protein complexes showed that the proposed ICCPred significantly increased inter-chain contact prediction accuracy compared to the state-of-the-art approaches. Detailed data analyses showed that the significant advantage of ICCPred lies in the utilization of pre-trained transformer language models which can effectively extract the complementary co-evolution diversity from three MSAs. Meanwhile, the designed deep residual network enhances the correlation between the co-evolution diversity and the patterns of inter-chain contacts. These results demonstrated a new avenue for high-accuracy deep-learning inter-chain contact prediction that is applicable to large-scale protein-protein interaction annotations from sequence alone.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?