Training One DeePMD Model in Minutes: a Step Towards Online Learning.

Siyu Hu,Tong Zhao,Qiuchen Sha,Enji Li,Xiangyu Meng,Liping Liu,Wang,Guangming Tan,Weile Jia
DOI: https://doi.org/10.1145/3627535.3638505
2024-01-01
Abstract:Neural Network Molecular Dynamics (NNMD) has become a major approach in material simulations, which can speedup the molecular dynamics (MD) simulation for thousands of times, while maintaining ab initio accuracy, thus has a potential to fundamentally change the paradigm of material simulations. However, there are two time-consuming bottlenecks of the NNMD developments. One is the data access of ab initio calculation results. The other, which is the focus of the current work, is reducing the training time of NNMD model. The training of NNMD model is different from most other neural network training because the atomic force (which is related to the gradient of the network) is an important physical property to be fit. Tests show the traditional stochastic gradient methods, like the Adam algorithms, cannot efficiently deploy the multisample minibatch algorithm. As a result, a typical training (taking the Deep Potential Molecular Dynamics (DeePMD) as an example) can take many hours. In this work, we designed a heuristic minibatch quasi-Newtonian optimizer based on Extended Kalman Filter method. An early reduction of gradient and error is adopted to reduce memory footprint and communication. The memory footprint, communication and settings of hyper-parameters of this new method are analyzed in detail. Computational innovations such as customized kernels of the symmetry-preserving descriptor are applied to exploit the computing power of the heterogeneous architecture. Experiments are performed on 8 different datasets representing different real case situations, and numerical results show that our new method has an average speedup of 32.2 compared to the Reorganized Layer-wised Extended Kalman Filter with 1 GPU, reducing the absolute training time of one DeePMD model from hours to several minutes, making it one step toward online training.
What problem does this paper attempt to address?