Active learning meets metadynamics: Automated workflow for reactive machine learning potentials

Fernanda Duarte,Veronika Juraskova,Tristan Johnston-Wood,Hanwen Zhang,Valdas Vitartas
DOI: https://doi.org/10.26434/chemrxiv-2024-twmlz
2024-11-01
Abstract:Atomistic simulations driven by machine learning-based potentials (MLPs) are a cost-effective alternative to ab initio molecular dynamics (AIMD). Yet, their broad applicability in reaction modelling remains hindered, in part, by the need for large training datasets that adequately sample the relevant potential energy surface, including high-energy transition state (TS) regions. To optimise dataset generation and extend the use of MLPs for reaction modelling, we present a workflow that combines automated active learning with well-tempered metadynamics, requiring no prior knowledge of TSs. Using data-efficient architectures, such as the linear Atomic Cluster Expansion, we illustrate the performance of this strategy in various organic reactions where the environment is described at different levels, including the SN2 reaction between fluoride and chloromethane in implicit water, the methyl shift of 2,2-dimethylisoindene in the gas phase, and a glycosylation reaction in explicit dichloromethane solution, where competitive pathways exist. The proposed training strategy yields accurate and stable MLPs for all three cases, highlighting its versatility for modelling reactive processes.
Chemistry
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address the application of machine learning potentials (MLPs) in reaction modeling, specifically how to efficiently generate training datasets that can adequately sample the relevant potential energy surfaces, including high-energy transition state regions. Although machine learning-based potentials are more cost-effective in atomic simulations compared to ab initio molecular dynamics (AIMD), their widespread application is still limited by the need for large training datasets that sufficiently describe the relevant potential energy surfaces, especially the high-energy transition state regions. To optimize dataset generation and expand the application of MLPs in reaction modeling, the authors propose a workflow that combines automated active learning (AL) and well-tempered metadynamics (WTMetaD). This method does not require prior knowledge of transition state information and can use data-efficient architectures (such as linear atomic cluster expansion) to enhance performance. By validating the method in various organic reactions, it demonstrates its applicability and accuracy at different levels of environmental description, including the S N2 reaction in implicit aqueous solution, the methyl migration reaction of 2,2-dimethylisindole in the gas phase, and the glycosylation reaction in explicit dichloromethane solvent. The results show that this training strategy can generate accurate and stable MLPs in all three cases, highlighting its versatility and effectiveness in reaction process modeling.