Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining

Blake R. Duschatko,Xiang Fu,Cameron Owen,Yu Xie,Albert Musaelian,Tommi Jaakkola,Boris Kozinsky
2024-05-30
Abstract:We present a differentiable formalism for learning free energies that is capable of capturing arbitrarily complex model dependencies on coarse-grained coordinates and finite-temperature response to variation of general system parameters. This is done by endowing models with explicit dependence on temperature and parameters and by exploiting exact differential thermodynamic relationships between the free energy, ensemble averages, and response properties. Formally, we derive an approach for learning high-dimensional cumulant generating functions using statistical estimates of their derivatives, which are observable cumulants of the underlying random variable. The proposed formalism opens ways to resolve several outstanding challenges in bottom-up molecular coarse graining dealing with multiple minima and state dependence. This is realized by using additional differential relationships in the loss function to significantly improve the learning of free energies, while exactly preserving the Boltzmann distribution governing the corresponding fine-grain all-atom system. As an example, we go beyond the standard force-matching procedure to demonstrate how leveraging the thermodynamic relationship between free energy and values of ensemble averaged all-atom potential energy improves the learning efficiency and accuracy of the free energy model. The result is significantly better sampling statistics of structural distribution functions. The theoretical framework presented here is demonstrated via implementations in both kernel-based and neural network machine learning regression methods and opens new ways to train accurate machine learning models for studying thermodynamic and response properties of complex molecular systems.
Computational Physics,Chemical Physics
What problem does this paper attempt to address?
The paper attempts to address several key challenges in molecular coarse-graining methods: 1. **Thermodynamic Representability**: Existing coarse-grained (CG) models typically learn thermodynamic properties (such as pressure, potential energy, or entropy) independently of the free energy function, leading to a lack of precise consistency between atomistic (AA) and CG models. The method proposed in this paper learns the potential mean force and other thermodynamic properties in a strictly consistent manner and imposes precise constraints. 2. **Multimodal Learning Capability**: Current bottom-up coarse-graining methods usually train using only average atomic forces, resulting in inefficient and inaccurate learning of free energy. The framework proposed in this paper efficiently utilizes various types of training data, thereby improving the accuracy of free energy and its related thermodynamic properties simultaneously. 3. **Exploration of Response Properties**: Existing methods lack the ability to simulate the system's response under external fields. The new method introduced in this paper addresses this issue by incorporating differentiable models to learn the CG-level response properties under arbitrary parameter inputs. In summary, this paper aims to improve the learning of high-dimensional free energy models through a new theoretical framework, particularly focusing on the learning of potential mean force in the molecular coarse-graining process, to enhance the accuracy of the models and the efficiency of data utilization.