Assessing the accuracy and efficiency of free energy differences obtained from reweighted flow-based probabilistic generative models

Matteo Salvalaglio,Michael Shirts,Ahmad Y Sheikh,Yifei Michelle Liu,Nada Mehio,Edgar Olehnovics

DOI: https://doi.org/10.26434/chemrxiv-2024-z9g39

2024-04-22

Abstract:Computing free energy differences between metastable states characterized by non-overlapping Boltzmann distributions is often a computationally intensive endeavour, usually requiring chains of intermediate states to connect these metastable states. Targeted free energy perturbation (TFEP) can significantly lower the computational cost of FEP calculations by choosing a set of invertible maps used to directly transform the distributions of interest, achieving the necessary statistically significant overlaps without sampling any intermediate states. Probabilistic generative models (PGMs) based on normalising-flow architectures can make it much easier via machine learning to train invertible maps needed for TFEP. However, the accuracy and applicability of approaches based on empirically learned maps depend crucially on the choice of reweighting method adopted to estimate the free energy differences. In this work, we assess the accuracy, rate of convergence, and data efficiency of different free energy estimators, including exponential averaging, BAR, and MBAR, in reweighting PGMs trained by maximum likelihood on limited amounts of molecular dynamics data sampled only from end-states of interest. We carry out the comparisons on a set of simple but representative case studies, including conformational ensembles of alanine dipeptide and ibuprofen. Our results indicate that BAR and MBAR are both data efficient and robust, even in the presence of significant model overfitting in the generation of invertible maps. This analysis can serve as a stepping stone for the deployment of efficient and quantitatively accurate ML-based FE calculation methods in complex systems.

Chemistry

What problem does this paper attempt to address?

This paper evaluates the accuracy and efficiency of using reweighted flow-based probabilistic generative models (PGMs) to calculate free energy differences. In physical modeling, particularly in drug design, estimating the thermodynamic stability of molecular systems is a computationally intensive task that often requires sampling from non-overlapping Boltzmann distributions of metastable states. Targeted free energy perturbation (TFEP) methods can reduce the computational cost by directly transforming the distribution of interest through a reversible mapping. The paper investigates the accuracy, convergence speed, and data efficiency of different free energy estimators, such as exponential averaging (EXP), Bennett acceptance ratio (BAR), and multistate Bennett acceptance ratio (MBAR), when training PGMs on limited molecular dynamics data. The study demonstrates, through a series of simple yet representative case studies including the conformational ensembles of dipeptides and ibuprofen, that even in cases of severe overfitting, BAR and MBAR methods still exhibit data efficiency and robustness in the absence of significant overlap. The authors emphasize the importance of strategies to avoid overfitting and propose heuristic methods for identifying overfitting and statistically consistent free energy estimates in the absence of reference true values. They compare the quantitative accuracy and convergence properties of standard free energy estimators by studying different dimensional model systems and compare them to benchmark results obtained using biased molecular dynamics simulations, such as temperature-controlled metadynamics. In summary, the paper aims to fill the gaps in existing literature and provide guidance for the development of machine learning-based, efficient, and quantitatively accurate free energy calculation methods in complex systems.

Assessing the accuracy and efficiency of free energy differences obtained from reweighted flow-based probabilistic generative models

Accurate Lattice Free Energies of Packing Polymorphs from Probabilistic Generative Models

Multimap targeted free energy estimation

Using AMBER18 for Relative Free Energy Calculations

Multireference Generalization of the Weighted Thermodynamic Perturbation Method.

Reweighting from Molecular Mechanics Force Fields to the ANI-2x Neural Network Potential

Considerations in the use of ML interaction potentials for free energy calculations

Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials

Accelerated Computation of Free Energy Profile at ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semi-Empirical Reference Potential. I. Weighted Thermodynamics Perturbation

Free Energy Methods for the Description of Molecular Processes

The maximal and current accuracy of rigorous protein-ligand binding free energy calculations

Calculation of free energy landscapes: A Histogram Reweighted Metadynamics approach

Improved Reweighting of Accelerated Molecular Dynamics Simulations for Free Energy Calculation

Understanding Free-Energy Perturbation Calculations Through A Model of Harmonic Oscillators: Theory and Implications to Improve the Sampling Efficiency by Molecular Simulation

Accelerated weight histogram method for exploring free energy landscapes

Automated Adaptive Absolute Binding Free Energy Calculations

Iterated Energy-based Flow Matching for Sampling from Boltzmann Densities

Equilibrium and Non-equilibrium Ensemble Methods for Accurate, Precise and Reproducible Absolute Binding Free Energy Calculations

Computing Absolute Free Energy with Deep Generative Models

Accurate Prediction of GPCR Ligand Binding Affinity with Free Energy Perturbation