More Structures, Less Accuracy: ESM3's Binding Prediction Paradox

Thomas Loux,Dianzhuo Wang,Eugene Shakhnovich
DOI: https://doi.org/10.1101/2024.12.09.627585
2024-12-09
Abstract:This paper investigates the impact of incorporating structural information into the protein-protein interaction predictions made by ESM3, a multimodal protein language model (pLM). We utilized various structural variants as inputs and compared three widely used structure acquisition pipelines: EvoEF2, Gromacs, and Rosetta Relax, to assess their effects on ESM3's performance. Our findings reveal that the use of a consistent identical structure, regardless of whether it is relaxed or variant, consistently enhances model performance across various datasets. This improvement is striking in few-show learning. However, performance deteriorates when different relaxed mutant structures are used for each variant. Based on these results, we advise caution when integrating distinct mutant structures into ESM3 and similar models.This study highlights the critical need for careful consideration of structural inputs in protein binding affinity prediction.
Molecular Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the impact of structural information on protein - protein interaction (PPI) prediction, especially when using the ESM3 model for binding affinity prediction. Specifically, the researchers explored the influence of structural variations generated by different structure - acquisition pipelines (such as EvoEF2, Gromacs, and Rosetta Relax) on the performance of the ESM3 model. They found that using the same consistent structure can significantly improve the prediction performance of the model, while using different relaxed mutant structures will lead to a decline in performance. This finding emphasizes the importance of carefully selecting structural inputs in protein - binding - affinity prediction. ### Main problems 1. **The impact of structural information on PPI prediction**: - The researchers experimentally verified the impact of structural information on the prediction performance of the ESM3 model, especially the performance differences when using the same structure or different mutant structures. 2. **The effects of different structure - acquisition pipelines**: - The influence of three structure - acquisition pipelines, EvoEF2, Gromacs, and Rosetta Relax, on the performance of the ESM3 model was evaluated. It was found that using the same structure significantly improved the model performance, while using different mutant structures led to a decline in performance. 3. **The impact of structural variations on model sensitivity**: - The researchers also explored the influence of structural variations (such as structures generated by molecular dynamics simulations and structures with Gaussian noise introduced) on the model performance, and found that these minor structural changes would lead to a significant decline in model performance. ### Formula - Binding Free Energy Change: \(\Delta\Delta G=\Delta G_{\text{MUT}}-\Delta G_{\text{WT}}\) - Where \(\Delta G_{\text{MUT}}\) is the binding free energy of the mutant, and \(\Delta G_{\text{WT}}\) is the binding free energy of the wild - type. ### Conclusion - Using the same consistent structure can significantly improve the performance of the ESM3 model in predicting protein - binding affinity. - Using different mutant structures will reduce the performance of the model. - Structural variations, even minor ones, will also have a significant impact on the model performance. These findings are of great guiding significance for how to select and process structural inputs in future protein - protein interaction prediction.