Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials

Yuyang Wang,Changwen Xu,Zijie Li,Amir Barati Farimani
DOI: https://doi.org/10.1021/acs.jctc.3c00289
2023-07-06
Abstract:Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data is greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.
Machine Learning,Artificial Intelligence,Chemical Physics
What problem does this paper attempt to address?
The paper aims to address the problem of constructing efficient and transferable molecular potential energy models, particularly when traditional quantum mechanics (QM) methods are costly and data-limited for large and complex molecular systems. To tackle this challenge, the study proposes a denoising pre-training method using non-equilibrium molecular conformations to enhance the accuracy and generalization ability of graph neural networks (GNNs) in molecular potential energy prediction. Specifically, the main contributions of the paper include: 1. **Proposing a denoising pre-training strategy**: By adding random noise to non-equilibrium molecular conformations and training the GNN to remove this noise, thereby restoring the original conformation. This method helps the GNN learn a more accurate potential energy prediction model. 2. **Enhancing model generalization**: Experiments demonstrate that the pre-trained GNN model shows significant performance improvement when fine-tuned on different types of molecular systems, including molecules with different elemental compositions, charged molecules, biomolecules, and larger-scale systems. 3. **Model agnosticism**: The pre-training method is effective for different invariant or equivariant GNN architectures, including SchNet, SE(3)-Transformer, EGNN, and TorchMD-Net. 4. **Experimental validation**: Rigorous experiments on multiple benchmark datasets showcase the effectiveness of the pre-training method, particularly in transferring pre-training on small molecules to more complex molecular systems. In summary, this paper proposes an innovative pre-training method aimed at improving GNN performance in molecular potential energy prediction tasks by utilizing the denoising task of non-equilibrium molecular conformations. It demonstrates the effectiveness of this method in enhancing model accuracy and generalization ability.