Assessing AF2’s ability to predict structural ensembles of proteins

Jakob R. Riccabona,Fabian C. Spoendlin,Anna-Lena M. Fischer,Johannes R. Loeffler,Patrick K. Quoika,Timothy P. Jenkins,James A. Ferguson,Eva Smorodina,Andreas H. Laustsen,Victor Greiff,Stefano Forli,Andrew B. Ward,Charlotte M. Deane,Monica L. Fernández-Quintero
DOI: https://doi.org/10.1101/2024.04.16.589792
2024-04-17
Abstract:Recent breakthroughs in protein structure prediction have enhanced the precision and speed at which protein configurations can be determined, setting new benchmarks for accuracy and efficiency in the field. However, the fundamental mechanisms of biological processes at a molecular level are often connected to conformational changes of proteins. Molecular dynamics (MD) simulations serve as a crucial tool for capturing the conformational space of proteins, providing valuable insights into their structural fluctuations. However, the scope of MD simulations is often limited by the accessible timescales and the computational resources available, posing challenges to comprehensively exploring protein behaviors. Recently emerging approaches have focused on expanding the capability of AlphaFold2 (AF2) to predict conformational substates of protein structures by manipulating the input multiple sequence alignment (MSA). These approaches operate under the assumption that the MSA also contains information about the heterogeneity of protein structures. Here, we benchmark the performance of various workflows that have adapted AF2 for ensemble prediction focusing on the subsampling of the MSA as implemented in ColabFold and compare the obtained structures with ensembles obtained from MD simulations and NMR. As test cases, we chose four proteins namely the bovine pancreatic inhibitor protein (BPTI), thrombin and two antigen binding fragments (antibody Fv and nanobody), for which reliable experimentally validated structural information (X-ray and/or NMR) was available. Thus, we provide an overview of the levels of performance and accessible timescales that can currently be achieved with machine learning (ML) based ensemble generation. In three out of the four test cases, we find structural variations fall within the predicted ensembles. Nevertheless, significant minima of the free energy surfaces remain undetected. This study highlights the possibilities and pitfalls when generating ensembles with AF2 and thus may guide the development of future tools while informing upon the results of currently available applications.
Biophysics
What problem does this paper attempt to address?
The problem this paper attempts to address is the evaluation of AlphaFold2 (AF2)'s capability in predicting ensembles of protein structures. Specifically, the researchers assess how modifications, such as adjusting the subsampling size of multiple sequence alignments (MSA) and the number of recycles, affect the quality of the predicted structure ensembles. They compare these modifications with experimental data (such as X-ray and NMR data) and molecular dynamics simulation results. The goal of the study is to explore the potential and limitations of AF2 in predicting protein conformational diversity, thereby providing guidance for the development of future tools and informing the results of current applications.