Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory

Alexander Mathiasen,Hatem Helal,Paul Balanca,Adam Krzywaniak,Ali Parviz,Frederik Hvilshøj,Blazej Banaszewski,Carlo Luschi,Andrew William Fitzgibbon
2024-02-06
Abstract:Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Schütt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimazaki, 2017) on which subsequent NNs are trained within a week. DFT labels molecules by minimizing energy $E(\cdot )$ as a "loss function." We bypass dataset creation by directly training NNs with $E(\cdot )$ as a loss function. For comparison, Schütt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h.
Machine Learning
What problem does this paper attempt to address?
This paper mainly explores how to reduce the cost of acquiring quantum chemical data. Currently, using density functional theory (DFT) to predict the quantum chemical properties of molecules is accurate but computationally expensive, with the required time growing exponentially as the molecule size increases. To address this issue, researchers propose a new pre-training technique, which directly uses the energy function E(·) of DFT as the loss function to train the neural network, thus avoiding the expensive process of generating DFT labels. The traditional approach is to first calculate a large amount of molecular data using DFT, and then use this data to train the neural network (NN). However, the time spent on generating the data set far exceeds the time spent on training the neural network. The method mentioned in the paper, called Quantum Pretraining Transformer (QPT), achieves new data samples at each training iteration by directly performing backpropagation with E(·) during the training process. This helps prevent overfitting and provides the potential for arbitrary scalability of the model. The QPT method achieves comparable accuracy to previous work without creating a data set, significantly reducing the total time and computational cost. The paper also mentions several key points, such as using initial DFT guesses to accelerate optimization, and techniques like quantum bias attention and density mixing to improve performance. Experimental results show that QPT achieves similar prediction accuracy as previous methods without using precomputed DFT labels, while greatly reducing the total time for data creation and training. This approach opens up new avenues for pretraining large molecules and neural network models, with potential applications in predicting protein-ligand interactions on a larger scale in the future.