Machine learning for molecular simulation

Frank Noé,Alexandre Tkatchenko,Klaus-Robert Müller,Cecilia Clementi
DOI: https://doi.org/10.1146/annurev-physchem-042018-052331
2019-11-07
Abstract:Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for a machine learning revolution and have already been profoundly impacted by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, coarse-grained molecular dynamics, the extraction of free energy surfaces and kinetics and generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into machine learning structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
Chemical Physics,Machine Learning,Computational Physics,Quantum Physics
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the complex and time - consuming computational problems in molecular simulations, which are particularly suitable for innovation through machine learning (ML) methods. Specifically, the paper mainly focuses on the following aspects: 1. **Potential Energy Surfaces (PES)**: - **Problem**: In molecular dynamics (MD) and Markov chain Monte Carlo (MCMC) simulations, the predictive ability of classical force fields under the Born - Oppenheimer approximation depends on the accuracy of the underlying potential energy surface (PES). However, classical PES models often lack transferability and can only provide accurate results in situations close to the fitting conditions (geometric structures). - **Solution**: Use machine learning methods, especially deep neural networks, to construct models that can accurately reproduce the global potential energy surface. These models optimize parameters through energy matching or force matching, thereby improving the accuracy of prediction. 2. **Free Energy Surfaces (FES)**: - **Problem**: Calculating the free energy of a system in the collective variable space is an important problem, but the integration of high - dimensional systems is difficult to solve analytically in practice. - **Solution**: Estimate the free energy and its gradient through machine learning methods such as kernel regression and neural networks, thereby reconstructing the entire free energy surface. In addition, combined with enhanced sampling methods, the free energy surface can be learned in real - time during the simulation process. 3. **Coarse - graining**: - **Problem**: Atom - scale simulations are very expensive, especially when dealing with complex molecular systems (such as proteins). - **Solution**: Design coarse - graining models, simplify atom - scale systems into fewer effective "beads" through mapping. Use machine learning methods to define the energy function of the coarse - graining model so that it is thermodynamically consistent with the atom - scale model. 4. **Kinetics**: - **Problem**: The kinetic processes of molecules usually contain slow parts, and directly simulating these processes requires a large amount of computational resources. - **Solution**: Learn the kinetics of molecules from a given trajectory data set through machine learning methods, and construct low - dimensional kinetic propagators, thereby simplifying analysis and interpretation. 5. **Sampling and Thermodynamics**: - **Problem**: Conformational changes related to molecular functions are usually rare events, and directly simulating these events requires an extremely long time. - **Solution**: Use generative learning methods, such as variational auto - encoders (VAEs), generative adversarial networks (GANs) and flow models, to efficiently generate equilibrium samples or independent statistical samples, thereby avoiding sampling problems. 6. **Incorporating Physics into Machine Learning**: - **Problem**: How to incorporate known physical principles into machine learning models to ensure the physical meaning of prediction results. - **Solution**: Improve the robustness and prediction accuracy of the model through data augmentation and directly building physical symmetries and invariances into machine learning models. In general, this paper aims to solve the key problems in molecular simulations through machine learning methods, especially deep neural networks, and improve the efficiency and accuracy of simulations.