From interaction networks to interfaces, scanning intrinsically disordered regions using AlphaFold2

Hélène Bret,Jinmei Gao,Diego Javier Zea,Jessica Andreani,Raphaël Guerois
DOI: https://doi.org/10.1038/s41467-023-44288-7
IF: 16.6
2024-01-18
Nature Communications
Abstract:Abstract The revolution brought about by AlphaFold2 opens promising perspectives to unravel the complexity of protein-protein interaction networks. The analysis of interaction networks obtained from proteomics experiments does not systematically provide the delimitations of the interaction regions. This is of particular concern in the case of interactions mediated by intrinsically disordered regions, in which the interaction site is generally small. Using a dataset of protein-peptide complexes involving intrinsically disordered regions that are non-redundant with the structures used in AlphaFold2 training, we show that when using the full sequences of the proteins, AlphaFold2-Multimer only achieves 40% success rate in identifying the correct site and structure of the interface. By delineating the interaction region into fragments of decreasing size and combining different strategies for integrating evolutionary information, we manage to raise this success rate up to 90%. We obtain similar success rates using a much larger dataset of protein complexes taken from the ELM database. Beyond the correct identification of the interaction site, our study also explores specificity issues. We show the advantages and limitations of using the AlphaFold2 confidence score to discriminate between alternative binding partners, a task that can be particularly challenging in the case of small interaction motifs.
multidisciplinary sciences
What problem does this paper attempt to address?
The main objective of this paper is to explore how to best utilize AlphaFold2 to identify specific binding interfaces from protein-protein interaction networks, especially when dealing with proteins involving Intrinsically Disordered Regions (IDRs). Specifically, the researchers developed a non-redundant benchmark set containing 42 protein-peptide complexes, which have no homology to any structures used in the AlphaFold2 multimer training dataset. Through different multiple sequence alignment (MSA) strategies, the study found: 1. **Poor performance with full-length proteins**: When using full-length protein sequences as input, the success rate of AlphaFold2 multimer was only 42.9%, significantly lower than the success rate with constrained boundary inputs. 2. **Fragment scanning improves success rate**: By scanning potential binding regions divided into smaller fragments (e.g., 100 or 200 amino acids), the prediction success rate can be significantly improved. Specifically, for fragments extended to 200 amino acids, the success rate reached 66.7%, which is notably higher than the success rate for full-length proteins. 3. **Advantages of combining different MSA modes**: By combining different MSA generation methods, researchers were able to increase the success rate to 90.5%. This indicates that combining multiple strategies is more effective than a single strategy when predicting protein-peptide complexes. Additionally, the study discussed the specificity issues of AlphaFold2 predictions, such as how to distinguish different potential binding sites, and evaluated the effectiveness of AlphaFold2 confidence scores in screening possible binding regions. Overall, this paper aims to improve the prediction accuracy of AlphaFold2 in handling protein interactions involving disordered regions by optimizing input parameters and MSA strategies.