Constructing accurate and efficient general-purpose atomistic machine learning model with transferable accuracy for quantum chemistry

Yicheng Chen,Wenjie Yan,Zhanfeng Wang,Jianming Wu,Xin Xu
2024-08-12
Abstract:Density Functional Theory (DFT) has been a cornerstone in computational science, providing powerful insights into structure-property relationships for molecules and materials through first-principles quantum-mechanical (QM) calculations. However, the advent of atomistic machine learning (ML) is reshaping the landscape by enabling large-scale dynamics simulations and high-throughput screening at DFT-equivalent accuracy with drastically reduced computational cost. Yet, the development of general-purpose atomistic ML models as surrogates for QM calculations faces several challenges, particularly in terms of model capacity, data efficiency, and transferability across chemically diverse systems. This work introduces a novel extension of the polarizable atom interaction neural network (namely, XPaiNN) to address these challenges. Two distinct training strategies have been employed, one direct-learning and the other $\Delta$-ML on top of a semi-empirical QM method. These methodologies have been implemented within the same framework, allowing for a detailed comparison of their results. The XPaiNN models, in particular the one using $\Delta$-ML, not only demonstrate competitive performance on standard benchmarks, but also demonstrate the effectiveness against other ML models and QM methods on comprehensive downstream tasks, including non-covalent interactions, reaction energetics, barrier heights, geometry optimization and reaction thermodynamics, etc. This work represents a significant step forward in the pursuit of accurate and efficient atomistic ML models of general-purpose, capable of handling complex chemical systems with transferable accuracy.
Chemical Physics
What problem does this paper attempt to address?
The paper aims to address the problem of constructing efficient and accurate general-purpose atomistic machine learning (ML) models that can handle quantum chemistry calculations with transferable accuracy. Specifically, the research targets the following major challenges: 1. **Model Capacity and Data Efficiency**: General-purpose atomistic ML models need to have sufficient capacity to learn complex patterns while also demonstrating excellent data efficiency, meaning they can train highly transferable models on smaller datasets. 2. **Transferability Across Chemical Systems**: The model needs to maintain good predictive performance across systems with different chemical properties. To address the above challenges, the researchers propose a new model called XPaiNN. This model is an extension of the Polarizable Atom Interaction Neural Network (PaiNN) and employs a Graph Neural Network (GNN) structure. XPaiNN enhances the model's representational capacity by introducing spherical feature channels and utilizes element-specific embeddings to initialize node features, which helps reflect the periodic trends of elements. The paper mentions two training strategies: - **Direct Learning**: The model directly fits the target labels of energy and forces. - **Δ-ML Strategy**: Building on the direct learning approach, this strategy uses semi-empirical quantum mechanical methods (such as GFN2-xTB) as a baseline method, and the model focuses on fitting the residuals between the target theoretical level and the baseline method. This approach can improve the model's accuracy and transferability. Additionally, the research team used the SPICE dataset to train the model. This dataset contains a large number of organic molecule conformations, covering various element types, and can well represent the actual chemical space. By employing these two different training strategies, the researchers compared the model's performance and validated its effectiveness in a series of downstream tasks, including reaction energies, non-covalent interactions, and geometry optimization. In summary, the goal of this paper is to develop an efficient and accurate general-purpose atomistic machine learning framework that can achieve high performance across a wide range of chemical systems and overcome the limitations of existing models. By introducing novel designs and effective training strategies, XPaiNN demonstrates competitiveness in standard benchmarks and shows good transferability in downstream tasks.