Comparing machine learning potentials for water: Kernel-based regression and Behler-Parrinello neural networks

Pablo Montero de Hijes,Christoph Dellago,Ryosuke Jinnouchi,Bernhard Schmiedmayer,Georg Kresse
2023-12-23
Abstract:In this paper we investigate the performance of different machine learning potentials (MLPs) in predicting key thermodynamic properties of water using RPBE+D3. Specifically, we scrutinize kernel-based regression and high-dimensional neural networks trained on a highly accurate dataset consisting of about 1,500 structures, as well as a smaller data set, about half the size, obtained using only on-the-fly learning. The study reveals that despite minor differences between the MLPs, their agreement on observables such as the diffusion constant and pair-correlation functions is excellent, especially for the large training dataset. Variations in the predicted density isobars, albeit somewhat larger, are also acceptable, particularly given the errors inherent to approximate density functional theory. Overall, the study emphasizes the relevance of the database over the fitting method. Finally, the study underscores the limitations of root mean square errors and the need for comprehensive testing, advocating the use of multiple MLPs for enhanced certainty, particularly when simulating complex thermodynamic properties that may not be fully captured by simpler tests.
Soft Condensed Matter
What problem does this paper attempt to address?
The paper primarily explores the performance of different Machine Learning Potentials (MLPs) in predicting key thermodynamic properties of water, specifically focusing on Kernel-based regression and Behler-Parrinello Neural Network (BPNNP) methods. The main objectives of the study include: 1. **Comparison of the performance of two MLPs**: By comparing Kernel-based regression (KbP) and BPNNP, the study evaluates their performance in predicting the diffusion constant, pair correlation function, and density-related properties of water. 2. **Importance of the database**: The study emphasizes that the training dataset's importance surpasses the choice of the specific fitting method itself. It was found that despite minor differences between the two methods, their predictions on key observables were very consistent, especially when using large-scale training datasets. 3. **Assessment of errors introduced by MLPs**: The research aims to determine whether using MLPs introduces significant errors in the observables. The results show that the errors introduced by the two different MLPs are almost identical and very small, which is promising for future applications in materials science and condensed matter physics. 4. **Limitations of benchmark testing**: The study points out that traditional Root Mean Square Error (RMSE) metrics may be insufficient to comprehensively evaluate the performance of MLPs, suggesting the need for more extensive testing to ensure accuracy in simulating complex thermodynamic properties. 5. **Data acquisition and processing**: The paper details how training data was obtained through first-principles calculations, including the computational methods used (such as the Projector Augmented-Wave (PAW) potentials in VASP software), the choice of density functional theory (RPBE + D3 functional), and the data preprocessing steps. 6. **Prediction of physical properties**: The two machine learning potentials were used to predict the structural, thermodynamic, and dynamic properties of water, including the maximum density, melting temperature, radial distribution function, and self-diffusion coefficient. In summary, this paper aims to predict key thermodynamic properties of water by comparing and analyzing two machine learning potentials, emphasizing the importance of high-quality databases, and highlighting the limitations of traditional evaluation metrics. It provides valuable insights for further development of machine learning potentials in condensed matter physics and materials science.