Learning to design protein-protein interactions with enhanced generalization

Anton Bushuiev,Roman Bushuiev,Petr Kouba,Anatolii Filkin,Marketa Gabrielova,Michal Gabriel,Jiri Sedlar,Tomas Pluskal,Jiri Damborsky,Stanislav Mazurenko,Josef Sivic

2024-03-17

Abstract:Discovering mutations enhancing protein-protein interactions (PPIs) is critical for advancing biomedical research and developing improved therapeutics. While machine learning approaches have substantially advanced the field, they often struggle to generalize beyond training data in practical scenarios. The contributions of this work are three-fold. First, we construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions, enabling effective large-scale learning. Second, we leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants. We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function. Finally, we demonstrate the enhanced generalization of our new PPIformer approach by outperforming other state-of-the-art methods on new, non-leaking splits of standard labeled PPI mutational data and independent case studies optimizing a human antibody against SARS-CoV-2 and increasing the thrombolytic activity of staphylokinase.

Machine Learning

What problem does this paper attempt to address?

This paper aims to address an important issue in protein-protein interaction (PPI) design, which is how to predict and design mutations that enhance binding affinity by enhancing generalization capability. Existing machine learning methods have made progress in this field, but often struggle to generalize beyond the training data in real-world scenarios. To address this, the contributions of the paper include: 1. Constructing a maximum non-redundant 3D protein-protein interaction dataset called PPIRef to support large-scale learning. 2. Introducing PPIformer, a novel SE(3) group-equivariant model that can generalize to different protein binder variants. 3. Fine-tuning PPIformer through a thermodynamically-inspired loss function adjustment to predict the impact of mutations on PPI and predict changes in binding affinity (ΔΔG). 4. Demonstrating the superior generalization performance of PPIformer in a novel non-leaking split of standard mutation data and independent case studies focusing on human antibodies against SARS-CoV-2 and enhancing fibrinolytic activity of kinase chains. The paper points out that existing machine learning methods suffer from issues such as data redundancy, leakage, and inadequate evaluation metrics when dealing with protein-protein interaction data, leading to an overestimation of model generalization capability. By constructing high-quality datasets and proposing new models, the research team aims to improve the reliability and generalization capability of protein design, which is of great significance for healthcare, biotechnology, and the treatment of various diseases such as cancer and neurodegenerative diseases.

Learning to design protein-protein interactions with enhanced generalization

Deep Learning Frameworks for Protein–protein Interaction Prediction

Protein-Protein Interaction Prediction is Achievable with Large Language Models

Deep Geometric Representations for Modeling Effects of Mutations on Protein-Protein Binding Affinity.

PLM-interact: extending protein language models to predict protein-protein interactions

MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein–protein interactions

Reliable prediction of protein-protein binding affinity changes upon mutations with Pythia-PPI

GGL-PPI: Geometric Graph Learning to Predict Mutation-Induced Binding Free Energy Changes

A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations

Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations

Pre-training of Graph Neural Network for Modeling Effects of Mutations on Protein-Protein Binding Affinity

Mutation effect estimation on protein–protein interactions using deep contextualized representation learning

DDMut-PPI: predicting effects of mutations on protein–protein interactions using graph-based deep learning

[Some patterns of the in vivo Plasmodium berghei entry into erythrocytes as revealed by scanning electron microscopy (author's transl)].

BindingGYM: A Large-Scale Mutational Dataset Toward Deciphering Protein-Protein Interactions

Effective Protein-Protein Interaction Exploration with PPIretrieval

Revealing data leakage in protein interaction benchmarks

SGPPI: structure-aware prediction of protein–protein interactions in rigorous conditions with graph convolutional network

Improved protein interaction models predict differences in complexes between human cell lines

Comparing two deep learning sequence-based models for protein-protein interaction prediction

Multi-level Interaction Modeling for Protein Mutational Effect Prediction