When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?

Shih-Cheng Li,Haoyang Wu,Angiras Menon,Kevin Spiekermann,Yi-Pei Li,William Green
DOI: https://doi.org/10.26434/chemrxiv-2024-7q438-v2
2024-04-04
Abstract:Deep graph neural networks are extensively utilized to predict chemical reactivity and molecular properties. However, because of the complexity of chemical space, such models often have difficulty extrapolating beyond the chemistry contained in the training set. Augmented model with quantum mechanical (QM) descriptors is anticipated to improve its generalizability. However, obtaining QM descriptors often requires CPU-intensive computational chemistry calculations. To identify when QM descriptors help graph neural networks predict chemical properties, we conduct a systematic investigation of the impact of atom, bond, and molecular QM descriptors on the performance of directed message passing neural networks (D-MPNNs) for predicting 16 molecular properties. The analysis surveys computational and experimental targets, classification and regression tasks, and varied dataset sizes from several hundred to hundreds of thousands of datapoints. Our results indicate that QM descriptors are mostly beneficial for D-MPNN performance on small datasets, provided that the descriptors correlate well with the targets and can be readily computed with high accuracy. Otherwise, using QM descriptors can add cost without benefit or even introduce unwanted noise that can degrade model performance. Strategic integration of QM descriptors with D-MPNN unlocks potential for physics-informed, data-efficient modeling with some interpretability that can streamline de novo drug and material designs. To facilitate the use of QM descriptors in machine learning workflows for chemistry, we provide a set of guidelines regarding when and how to best leverage QM descriptors, a high-throughput workflow to compute them, and an enhancement to Chemprop, a widely adopted open-source D-MPNN implementation for chemical property prediction.
Chemistry
What problem does this paper attempt to address?
The paper primarily explores the role of Quantum Mechanical (QM) descriptors in predicting chemical properties using Graph Neural Networks (GNNs) and their optimal application conditions. Specifically, the paper aims to address the following key questions: 1. **Effectiveness of QM Descriptors**: The paper aims to systematically study the impact of atomic, bond, and molecular-level QM descriptors on the performance of Directed Message Passing Neural Networks (D-MPNNs) in predicting 16 different molecular properties. 2. **Impact of Dataset Size**: The study examines the effect of different dataset sizes (ranging from a few hundred to several hundred thousand data points) on model performance and analyzes the differences in performance between classification and regression tasks. 3. **Best Practices for QM Descriptors**: Based on the research findings, the paper proposes a decision flowchart to guide how and when to best utilize QM descriptors to enhance the performance of GNN models. 4. **Computation and Integration of QM Descriptors**: The paper provides a high-throughput workflow for computing these descriptors and improves the Chemprop software package to support the efficient integration of QM descriptors. The core objective of the paper is to determine when QM descriptors help improve the accuracy of GNNs in predicting chemical properties, especially in data-limited scenarios. The study finds that QM descriptors are generally beneficial for smaller datasets, provided that these descriptors are highly relevant to the target properties and can be accurately computed. However, as the dataset size increases, the advantage of QM descriptors gradually diminishes. Additionally, the paper offers a series of recommendations on how to select and effectively utilize QM descriptors to optimize model performance and streamline the design process of new drugs and materials.