Comparing machine learning potentials for water: Kernel-based regression and Behler–Parrinello neural networks

Pablo Montero de Hijes,Christoph Dellago,Ryosuke Jinnouchi,Bernhard Schmiedmayer,Georg Kresse
DOI: https://doi.org/10.1063/5.0197105
IF: 4.304
2024-03-20
The Journal of Chemical Physics
Abstract:In this paper, we investigate the performance of different machine learning potentials (MLPs) in predicting key thermodynamic properties of water using RPBE + D3. Specifically, we scrutinize kernel-based regression and high-dimensional neural networks trained on a highly accurate dataset consisting of about 1500 structures, as well as a smaller dataset, about half the size, obtained using only on-the-fly learning. This study reveals that despite minor differences between the MLPs, their agreement on observables such as the diffusion constant and pair-correlation functions is excellent, especially for the large training dataset. Variations in the predicted density isobars, albeit somewhat larger, are also acceptable, particularly given the errors inherent to approximate density functional theory. Overall, this study emphasizes the relevance of the database over the fitting method. Finally, this study underscores the limitations of root mean square errors and the need for comprehensive testing, advocating the use of multiple MLPs for enhanced certainty, particularly when simulating complex thermodynamic properties that may not be fully captured by simpler tests.
chemistry, physical,physics, atomic, molecular & chemical
What problem does this paper attempt to address?
The paper primarily explores the performance of different Machine Learning Potentials (MLPs) in predicting key thermodynamic properties of water, specifically comparing Kernel-based regression methods with Behler-Parrinello Neural Networks (BPNNPs). The datasets used in the study include a highly accurate dataset of approximately 1500 structures, and another dataset about half the size of the former, which was obtained solely through online learning. By training on these datasets and conducting comparative analyses of observable properties such as diffusion constants and pair correlation functions, the study found that despite subtle differences between the two MLPs, both performed excellently in predicting the aforementioned properties, especially with larger training datasets. Additionally, while there were slightly larger variations in the predicted density isotherms, these differences were still acceptable considering the inherent errors of approximate density functional theory. Overall, the study emphasizes that the importance of the database surpasses that of the fitting method itself, while also highlighting the limitations of Root Mean Square Error (RMSE) as an evaluation metric. It advocates for the use of multiple MLPs to enhance certainty when simulating complex thermodynamic properties.