Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

Weile Jia,Han Wang,Mohan Chen,Denghui Lu,Lin Lin,Roberto Car,Weinan E,Linfeng Zhang
2020-09-14
Abstract:For 35 years, {\it ab initio} molecular dynamics (AIMD) has been the method of choice for modeling complex atomistic phenomena from first principles. However, most AIMD applications are limited by computational cost to systems with thousands of atoms at most. We report that a machine learning-based simulation protocol (Deep Potential Molecular Dynamics), while retaining {\it ab initio} accuracy, can simulate more than 1 nanosecond-long trajectory of over 100 million atoms per day, using a highly optimized code (GPU DeePMD-kit) on the Summit supercomputer. Our code can efficiently scale up to the entire Summit supercomputer, attaining $91$ PFLOPS in double precision ($45.5\%$ of the peak) and {$162$/$275$ PFLOPS in mixed-single/half precision}. The great accomplishment of this work is that it opens the door to simulating unprecedented size and time scales with {\it ab initio} accuracy. It also poses new challenges to the next-generation supercomputer for a better integration of machine learning and physical modeling.
Computational Physics
What problem does this paper attempt to address?
### Main Problems Addressed by the Paper The primary goal of this paper is to significantly enhance the capabilities of molecular dynamics simulations by combining machine learning (specifically Deep Potential Molecular Dynamics, or DeePMD) with high-performance computing technologies. This allows for handling large-scale systems with over 100 million atoms while maintaining ab initio accuracy. ### Specific Problem Description - **Limitations of Traditional Methods**: Ab initio molecular dynamics (AIMD) is a crucial tool for understanding and simulating complex atomic processes in materials and molecules. However, its application is typically limited by computational costs, only handling systems with up to a few thousand atoms. - **Computational Cost Issue**: The cost of AIMD grows cubically with the number of electronic degrees of freedom. Even with the fastest supercomputers, the size of the systems that can be simulated only increases by a few times. - **Practical Needs**: In fields such as complex chemical reactions, electrochemical batteries, and nanocrystalline materials, the required simulation systems often contain tens of millions to hundreds of millions of atoms and need to reach microsecond or even longer time scales, far beyond the capabilities of AIMD. - **Insufficiencies of Existing Solutions**: Empirical force fields (EFF) can scale to larger systems but have limited accuracy. Linear-scaling density functional theory (DFT) has improved but still has a large prefactor, making it unsuitable for long time scales. ### Solution Overview - **Deep Potential Molecular Dynamics (DeePMD)**: This is a machine learning-based molecular dynamics simulation protocol that approximates ab initio accuracy of interatomic forces by training deep neural networks. This method maintains AIMD accuracy while achieving efficiency close to empirical force fields. - **High-Performance Computing Optimization**: Researchers developed a highly optimized code (GPU DeePMD-kit) that runs efficiently on the Summit supercomputer, enabling the simulation of trajectories of over 100 million atoms for up to 1 nanosecond per day. - **Algorithm and Implementation Innovations**: - Increased computational granularity by redesigning the data layout of the neighbor list to avoid computational branching. - Optimized custom TensorFlow operators to improve execution efficiency on GPUs. - Implemented mixed-precision computation to reduce memory bandwidth requirements without sacrificing the accuracy of physical quantity predictions. - Reduced MPI communication bottlenecks by optimizing communication in the ghost region and the collection of global physical properties. In summary, this paper addresses the key challenges of large-scale molecular dynamics simulations, achieving unprecedented simulation scales and time frames, providing strong support for complex chemical and materials science research.