AlphaFold2 knows some protein folding principles

Liwei Chang,Alberto Perez
DOI: https://doi.org/10.1101/2024.08.25.609581
2024-08-26
Abstract:AlphaFold2 (AF2) has revolutionized protein structure prediction. However, a common confusion lies in equating the protein structure prediction problem with the protein folding problem. The former provides a static structure, while the latter explains the dynamic folding pathway to that structure. We challenge the current status quo and advocate that AF2 has indeed learned some protein folding principles, despite being designed for structure prediction. AF2's high-dimensional parameters encode an imperfect biophysical scoring function. Typically, AF2 uses multiple sequence alignments (MSAs) to guide the search within a narrow region of its learned surface. In our study, we operate AF2 without MSAs or initial templates, forcing it to sample its entire energy landscape - more akin to an ab initio approach. Among over 7,000 proteins, a fraction fold using sequence alone, highlighting the smoothness of AF2's learned energy surface. Additionally, by combining recycling and iterative predictions, we discover multiple AF2 intermediate structures in good agreement with known experimental data. AF2 appears to follow a "local first, global later" folding mechanism. For designed proteins with more optimized local interactions, AF2's energy landscape is too smooth to detect intermediates even when it should. Our current work sheds new light on what AF2 has learned and opens exciting possibilities to advance our understanding of protein folding and for experimental discovery of folding intermediates.
Biophysics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore the potential capabilities of AlphaFold2 (AF2) in protein folding, particularly whether it can reveal intermediate states in the protein folding process. Although AF2 was initially designed to predict protein structures, the authors believe that AF2 has actually learned some fundamental principles of protein folding. Specifically, the paper attempts to address the following points: 1. **Difference between protein folding and structure prediction**: Protein structure prediction focuses on generating static images of the folded state, while the protein folding problem seeks to understand the dynamic folding process, including pathways and intermediate states. 2. **AF2's energy landscape**: By not using multiple sequence alignments (MSAs) or initial templates, AF2 is forced to sample its entire energy landscape, thereby verifying whether AF2 can simulate intermediate states in the protein folding process. 3. **Iterative prediction method**: By combining recycling and iterative prediction, it was found that AF2 can predict multiple intermediate structures that are consistent with experimental data, and these intermediate structures are highly consistent with known experimental data. 4. **"Local first, global later" folding mechanism**: AF2 seems to follow a "local first, global later" folding mechanism, where early folding is dominated by local structures, which gradually form global structures. 5. **Consistency of different protein folding pathways**: By studying various proteins (such as protein G, protein L and their mutants, ubiquitin, and SH3 domains), it was demonstrated that AF2 can capture complex folding pathways and reveal how different sequence variations affect folding pathways. 6. **Large-scale protein folding prediction**: Through a large-scale iterative prediction method, 7418 proteins were studied, further validating AF2's performance on proteins of different sizes and structural types, and discovering groups of proteins with similar folding pathways. Through these studies, the paper demonstrates that AF2 can not only predict static protein structures but also reveal intermediate states in the protein folding process, providing new perspectives for understanding and discovering protein folding.