Accurate machine learning force fields via experimental and simulation data fusion

Sebastien Röcken,Julija Zavadlav
DOI: https://doi.org/10.1038/s41524-024-01251-4
2023-08-18
Abstract:Machine Learning (ML)-based force fields are attracting ever-increasing interest due to their capacity to span spatiotemporal scales of classical interatomic potentials at quantum-level accuracy. They can be trained based on high-fidelity simulations or experiments, the former being the common case. However, both approaches are impaired by scarce and erroneous data resulting in models that either do not agree with well-known experimental observations or are under-constrained and only reproduce some properties. Here we leverage both Density Functional Theory (DFT) calculations and experimentally measured mechanical properties and lattice parameters to train an ML potential of titanium. We demonstrate that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single data source. The inaccuracies of DFT functionals at target experimental properties were corrected, while the investigated off-target properties remained largely unperturbed. Our approach is applicable to any material and can serve as a general strategy to obtain highly accurate ML potentials.
Chemical Physics,Machine Learning,Computational Physics
What problem does this paper attempt to address?
This paper mainly discusses how to improve the accuracy of machine learning (ML) force fields by integrating experimental and simulated data. Traditional force field training is usually based on high-fidelity simulation or experimental data, but these methods are affected by data scarcity and errors, leading to models that may be inconsistent with experimental observations or under-constrained. The researchers combined the mechanical properties and lattice parameters of titanium calculated by density functional theory (DFT) and experimental measurements to train an ML potential model. They proposed a data fusion learning strategy that can simultaneously meet all objectives, resulting in a more accurate molecular model than models trained with a single data source. This approach corrects the inaccuracies of DFT calculations in certain experimental properties, while having a smaller impact on non-target properties. The paper also emphasizes the importance of the size of the training dataset, system scale, and long-range interactions. In their research, they used a graph neural network (GNN) potential model and iteratively applied DFT and experimental trainers to learn from both simulated and experimental data. The results show that this fusion training method can improve the accuracy of the model, especially in the mechanical properties and lattice parameters of titanium, while having a smaller impact on other non-target properties. In addition, the paper demonstrates the influence of the amount of experimental data and temperature transferability on the model performance through data exclusion experiments, indicating that increasing diverse experimental data is more beneficial than densely sampling a single property. Overall, this paper addresses how to construct more accurate machine learning force field models by integrating experimental and simulated data to improve the predictive accuracy of molecular dynamics simulations in materials science.