Learning Together: Towards foundational models for machine learning interatomic potentials with meta-learning

Alice E. A. Allen,Nicholas Lubbers,Sakib Matin,Justin Smith,Richard Messerly,Sergei Tretiak,Kipton Barros
2023-07-09
Abstract:The development of machine learning models has led to an abundance of datasets containing quantum mechanical (QM) calculations for molecular and material systems. However, traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method. Taking machine learning interatomic potentials (MLIPs) as an example, we show that meta-learning techniques, a recent advancement from the machine learning community, can be used to fit multiple levels of QM theory in the same training process. Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data. We then demonstrate that meta-learning enables simultaneously training to multiple large organic molecule datasets. As a proof of concept, we examine the performance of a MLIP refit to a small drug-like molecule and show that pre-training potentials to multiple levels of theory with meta-learning improves performance. This difference in performance can be seen both in the reduced error and in the improved smoothness of the potential energy surface produced. We therefore show that meta-learning can utilize existing datasets with inconsistent QM levels of theory to produce models that are better at specializing to new datasets. This opens new routes for creating pre-trained, foundational models for interatomic potentials.
Chemical Physics,Computational Physics
What problem does this paper attempt to address?
This paper aims to address the issue of how to effectively utilize datasets from different quantum mechanics (QM) theory levels during the training of machine learning interatomic potentials (MLIPs). Traditionally, the training methods for machine learning models require that each dataset be generated using the same quantum mechanics method, which limits the effective use of a large amount of available data. The paper overcomes this challenge by introducing meta-learning techniques, demonstrating how to fit multiple levels of quantum mechanics theory during the training process. Specifically, meta-learning techniques enable the model to learn from a wide range of data and quickly retrain to adapt to new tasks, effectively handling even small amounts of data for new tasks. Through this approach, researchers can utilize existing datasets with inconsistent quantum mechanics levels to create better-performing models, thereby better adapting to new datasets. This is significant for creating pre-trained foundational interatomic potential models. Additionally, the paper showcases the application of meta-learning on multiple large organic molecule datasets and proves its advantages in improving model performance.