Abstract:Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for learned MD simulation. We curate representative MD systems, including water, organic molecules, a peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate future work.

What problem does this paper attempt to address?

The problem this paper attempts to address is: Despite significant advances in machine learning force fields (ML FF) for predicting forces and energies, these models are primarily evaluated based on their force/energy prediction errors. However, in practical applications, these models need to be capable of generating realistic molecular dynamics (MD) trajectories. Therefore, the existing evaluation criteria are not fully applicable for assessing the actual performance of ML FF. To fill this gap, the authors introduce a new benchmark suite for evaluating machine learning-based MD simulations. They carefully selected representative MD systems, including water, organic molecules, peptides, and materials, and designed evaluation metrics corresponding to the scientific objectives of each system. Through this benchmark suite, the authors aim to reveal the performance of state-of-the-art ML FF models in practical MD simulations, particularly their shortcomings in stability, and provide guidance for future improvements. Specifically, the main contributions of the paper include: 1. Introducing a new ML MD simulation benchmark suite, including simulation protocols and quantitative metrics. The authors conducted extensive experiments to benchmark a range of state-of-the-art ML models and provided a complete codebase to lower the entry barrier and promote future research. 2. Demonstrating that many existing models perform poorly in simulation-based benchmarks, even if they excel in force prediction. 3. Summarizing common failure modes through the analysis of MD simulations and discussing the reasons and potential solutions to inspire future research. In summary, this paper aims to emphasize the importance of simulation-based evaluation for the practical application of ML FF and demonstrates through experiments that relying solely on the accuracy of force predictions is insufficient.

Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations

Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics

Applications and Advances in Machine Learning Force Fields

On the design space between molecular mechanics and machine learning force fields

Crash Testing Machine Learning Force Fields for Molecules, Materials, and Interfaces: Model Analysis in the TEA Challenge 2023

Crash Testing Machine Learning Force Fields for Molecules, Materials, and Interfaces: Molecular Dynamics in the TEA Challenge 2023

The emergence of machine learning force fields in drug design

Accuracy evaluation of different machine learning force field features

Machine Learning Force Fields with Data Cost Aware Training

xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium

DMFF: an Open-Source Automatic Differentiable Platform for Molecular Force Field Development and Molecular Dynamics Simulation.

CHIPS-FF: Evaluating Universal Machine Learning Force Fields for Material Properties

EL-MLFFs: Ensemble Learning of Machine Leaning Force Fields

Machine Learning Directed Optimization of Classical Molecular Modeling Force Fields

Liquid to Crystal Si Growth Simulation Using Machine Learning Force Field

Accurate machine learning force fields via experimental and simulation data fusion

Accurate interatomic force field for molecular dynamics simulation by hybridizing classical and machine learning potentials

Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems

Evaluating the Transferability of Machine-Learned Force Fields for Material Property Modeling

Machine Learning Force Fields: Construction, Validation, and Outlook