Integration of molecular coarse-grained model into geometric representation learning framework for protein-protein complex property prediction

Yang Yue,Shu Li,Yihua Cheng,Zexuan Zhu,Lie Wang,Tingjun Hou,Shan He
DOI: https://doi.org/10.1101/2024.03.14.585015
2024-03-16
Abstract:Structure-based machine learning algorithms have been utilized to predict the properties of protein-protein interaction (PPI) complexes, such as binding affinity, which is critical for understanding biological mechanisms and disease treatments. While most existing algorithms represent PPI complex graph structures at the atom-scale or residue-scale, these representations can be computationally expensive or may not sufficiently integrate finer chemical-plausible interaction details for improving predictions. Here, we introduce MCGLPPI, a novel geometric representation learning framework that combines graph neural networks (GNNs) with the MARTINI molecular coarse-grained (CG) model to predict overall PPI properties accurately and efficiently. This framework maps proteins onto a concise CG-scale complex graph, where nodes represent CG beads and edges encode chemically plausible interactions. The GNN-based encoder is tailored to extract high-quality representations from this graph, efficiently capturing the overall properties of the protein complex structure. Extensive experiments on three different downstream PPI property prediction tasks demonstrate that MCGLPPI achieves competitive performance compared with the counterparts at the atom- and residue-scale, but with only a third of the computational resource consumption. Furthermore, the CG-scale pre-training on protein domain-domain interaction structures enhances its predictive capabilities for PPI tasks. MCGLPPI offers an effective and efficient solution for PPI overall property predictions, serving as a promising tool for the large-scale analysis of biomolecular interactions.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve computational efficiency while maintaining prediction accuracy when predicting the properties of protein - protein interaction (PPI) complexes. Specifically, most of the existing algorithms use atomic - scale or residue - scale when representing the graph structure of PPI complexes. Although these representation methods are detailed, they are computationally expensive and may not fully integrate more refined chemically - reasonable interaction details to improve prediction performance. Therefore, this paper proposes a new geometric representation learning framework MCGLPPI, which combines graph neural networks (GNNs) with the MARTINI molecular coarse - graining (CG) model, aiming to accurately and efficiently predict the overall properties of PPI complexes with lower computational resource consumption. By introducing the CG - scale protein graph representation, in which nodes represent CG beads and edges encode chemically - reasonable interactions, MCGLPPI can effectively capture the overall characteristics of protein complex structures. The experimental results show that MCGLPPI performs excellently in three different downstream PPI property prediction tasks, and its performance is comparable to that of atomic - scale and residue - scale methods, but the computational resource consumption is only one - third of the latter. In addition, the CG - scale pre - training based on protein domain - domain interaction (DDI) structures further enhances the prediction ability of MCGLPPI for PPI tasks. Overall, MCGLPPI provides an effective and efficient solution for large - scale analysis of interactions between biomolecules.