Abstract:To address the computational challenges of ab initio molecular dynamics and the accuracy limitations of empirical force fields, the introduction of machine learning force fields has proven effective in various systems including metals and inorganic materials. However, in large-scale organic systems, the application of machine learning force fields is often hindered by impediments such as the complexity of long-range intermolecular interactions and molecular conformations, as well as the instability in long-time molecular simulations. Therefore, we propose a universal multiscale higher-order equivariant model combined with active learning techniques, efficiently capturing the complex long-range intermolecular interactions and molecular conformations. Compared to existing equivariant models, our model achieves the highest predictive accuracy, and magnitude-level improvements in computational speed and memory efficiency. In addition, a bond length stretching method is designed to improve the stability of long-time molecular simulations. Utilizing only 901 samples from a dataset with 120 atoms, our model successfully extends high precision to systems with hundreds of thousands of atoms. These achievements guarantee high predictive accuracy, fast simulation speed, minimal memory consumption, and robust simulation stability, satisfying the requirements for high-precision and long-time molecular simulations in large-scale organic systems.
What problem does this paper attempt to address?
This paper mainly discusses the computational challenges encountered in performing molecular simulations in large-scale organic systems, specifically the high computational cost of ab initio molecular dynamics (AIMD) and the accuracy limitations of empirical force fields. To address these issues, the researchers propose a generalized multiscale high-order invariant model combined with active learning techniques to efficiently capture complex long-range molecular interactions and molecular conformations. Compared to existing invariant models, the new model demonstrates significant improvements in prediction accuracy, computational speed, and memory efficiency.
While current machine learning force field (MLFF) methods have improved prediction accuracy, they often underperform when dealing with large-scale organic systems due to the complexity of long-range interactions, diversity of molecular conformations, and instability in long-time simulations. The paper mentions that increasing the interaction layers of the network to expand the receptive field can enhance prediction accuracy but at the cost of increased computation, which may lead to oversmoothing. Additionally, solely increasing hyperparameters of the model, such as the cutoff radius, would result in increased computational time and memory consumption.
To address the aforementioned issues, the paper proposes a method based on a multiscale high-order invariant model that utilizes a high-valence invariant model together with a low-cost module for handling long-range interactions to improve efficiency. By reducing node channel dimensions and lowering the order of directional unfolding, the model can effectively process long-range messages while maintaining consistency in local and global features. Furthermore, the paper introduces an active learning strategy based on committee queries to collect training data more efficiently. To enhance stability in long-time simulations, the researchers design a bond-stretching method that extends molecular conformations with abnormal bond lengths in the training data set to avoid simulation crashes.
Experimental results demonstrate outstanding performance of the proposed model in predicting energy, accuracy of forces, simulation speed, memory consumption, and simulation stability. Particularly in the example of interacting formaldehyde molecules, the model accurately predicts energy changes better than traditional methods that only increase interaction layers or cutoff radius. In terms of simulation stability, the model effectively handles extreme bond lengths and significantly improves the success rate of long-time molecular dynamics simulations through the bond-stretching method.
In conclusion, this paper aims to address efficiency, accuracy, and stability issues in machine learning force field models for large-scale organic systems. Through innovative model architecture and data collection strategies, it provides an effective tool for high-precision, long-time molecular simulations.