Abstract:Abstract The revolution brought about by AlphaFold2 opens promising perspectives to unravel the complexity of protein-protein interaction networks. The analysis of interaction networks obtained from proteomics experiments does not systematically provide the delimitations of the interaction regions. This is of particular concern in the case of interactions mediated by intrinsically disordered regions, in which the interaction site is generally small. Using a dataset of protein-peptide complexes involving intrinsically disordered regions that are non-redundant with the structures used in AlphaFold2 training, we show that when using the full sequences of the proteins, AlphaFold2-Multimer only achieves 40% success rate in identifying the correct site and structure of the interface. By delineating the interaction region into fragments of decreasing size and combining different strategies for integrating evolutionary information, we manage to raise this success rate up to 90%. We obtain similar success rates using a much larger dataset of protein complexes taken from the ELM database. Beyond the correct identification of the interaction site, our study also explores specificity issues. We show the advantages and limitations of using the AlphaFold2 confidence score to discriminate between alternative binding partners, a task that can be particularly challenging in the case of small interaction motifs.

What problem does this paper attempt to address?

The main objective of this paper is to explore how to best utilize AlphaFold2 to identify specific binding interfaces from protein-protein interaction networks, especially when dealing with proteins involving Intrinsically Disordered Regions (IDRs). Specifically, the researchers developed a non-redundant benchmark set containing 42 protein-peptide complexes, which have no homology to any structures used in the AlphaFold2 multimer training dataset. Through different multiple sequence alignment (MSA) strategies, the study found: 1. **Poor performance with full-length proteins**: When using full-length protein sequences as input, the success rate of AlphaFold2 multimer was only 42.9%, significantly lower than the success rate with constrained boundary inputs. 2. **Fragment scanning improves success rate**: By scanning potential binding regions divided into smaller fragments (e.g., 100 or 200 amino acids), the prediction success rate can be significantly improved. Specifically, for fragments extended to 200 amino acids, the success rate reached 66.7%, which is notably higher than the success rate for full-length proteins. 3. **Advantages of combining different MSA modes**: By combining different MSA generation methods, researchers were able to increase the success rate to 90.5%. This indicates that combining multiple strategies is more effective than a single strategy when predicting protein-peptide complexes. Additionally, the study discussed the specificity issues of AlphaFold2 predictions, such as how to distinguish different potential binding sites, and evaluated the effectiveness of AlphaFold2 confidence scores in screening possible binding regions. Overall, this paper aims to improve the prediction accuracy of AlphaFold2 in handling protein interactions involving disordered regions by optimizing input parameters and MSA strategies.

From interaction networks to interfaces, scanning intrinsically disordered regions using AlphaFold2

Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation

Revealing missing protein-ligand interactions using AlphaFold predictions

Improved prediction of protein-protein interactions using AlphaFold2

Using AlphaFold Multimer to discover interkingdom protein-protein interactions

AlphaFold-Multimer accurately captures interactions and dynamics of intrinsically disordered protein regions

Using AlphaFold Multimer to discover interkingdom protein–protein interactions

Integration of a Randomized Sequence Scanning Approach in AlphaFold2 and Local Frustration Profiling of Conformational States Enable Interpretable Atomistic Characterization of Conformational Ensembles and Detection of Hidden Allosteric States in the ABL1 Protein Kinase

Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes

actifpTM: a refined confidence metric of AlphaFold2 predictions involving flexible regions

Identifying protein-protein interface via a novel multi-scale local sequence and structural representation

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes

Predicting direct physical interactions in multimeric proteins with deep learning

Unmasking AlphaFold: integration of experiments and predictions in multimeric complexes

Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants

Generation of a high confidence set of domain-domain interface types to guide protein complex structure predictions by AlphaFold

Dissecting AlphaFolds Capabilities with Limited Sequence Information

AlphaFold and Implications for Intrinsically Disordered Proteins

Benchmarking Refined and Unrefined AlphaFold2 Structures for Hit Discovery

Accurate structure prediction of biomolecular interactions with AlphaFold 3