Effective Data-Driven Collective Variables for Free Energy Calculations from Metadynamics of Paths

Lukas Müllender,Andrea Rizzi,Michele Parrinello,Paolo Carloni,Davide Mandelli
2024-04-08
Abstract:variety of enhanced sampling methods predict multidimensional free energy landscapes associated with biological and other molecular processes as a function of a few selected collective variables (CVs). The accuracy of these methods is crucially dependent on the ability of the chosen CVs to capture the relevant slow degrees of freedom of the system. For complex processes, finding such CVs is the real challenge. Machine learning (ML) CVs offer, in principle, a solution to handle this problem. However, these methods rely on the availability of high-quality datasets -- ideally incorporating information about physical pathways and transition states -- which are difficult to access, therefore greatly limiting their domain of application. Here, we demonstrate how these datasets can be generated by means of enhanced sampling simulations in trajectory space via the metadynamics of paths [arXiv:2002.09281] algorithm. The approach is expected to provide a general and efficient way to generate efficient ML-based CVs for the fast prediction of free energy landscapes in enhanced sampling simulations. We demonstrate our approach with two numerical examples, a two-dimensional model potential and the isomerization of alanine dipeptide, using deep targeted discriminant analysis as our ML-based CV of choice.
Computational Physics
What problem does this paper attempt to address?
This paper mainly discusses how to effectively select and generate data-driven collective variables (CVs) in free energy calculations. CVs are crucial for describing the multidimensional free energy landscape of complex biological and molecular processes in enhanced sampling methods. However, finding CVs that can capture the system's key slow dynamics is challenging. Machine learning (ML) CVs theoretically can solve this problem, but they require high-quality datasets, especially ones that include physical path and transition state information, which are often difficult to obtain. The paper proposes a new method that uses the Metadynamics of Paths (MoP) algorithm to perform enhanced sampling simulations in the trajectory space to generate the required dataset. This method can generate efficient ML-CVs for quickly predicting the free energy landscape in enhanced sampling simulations. The authors demonstrate this method with two numerical examples (a two-dimensional model potential energy and the isomerization of alanine dipeptide), using Deep Targeted Discriminant Analysis (DeepTDA) as the ML-CV. The research process includes: 1. Obtaining data from standard MD simulations of initial and final states and training a DeepTDA model to obtain CVs. 2. Building a trajectory space CV for MoP simulations based on the CV obtained in the first step. 3. Analyzing the results of MoP simulations to identify new metastable states and transition paths, using this data to train new DeepTDA CVs, and iterating this process until effective CVs are found. With this method, MoP can effectively sample transition paths even when the initial CV is not ideal, providing valuable data about transition states for training more efficient configuration space CVs. This method has the potential to accelerate free energy landscape calculations in complex systems and promote understanding of important molecular transformations in fields such as biology, chemistry, and materials science.