HCat-GNet: An Interpretable Graph Neural Network for Catalysis Optimization

Eduardo Alberto Aguilar Bejarano,Simon Woodward,Grazziela Figueredo,Hon Wai Lam,Ender Özcan,Raja Rit
DOI: https://doi.org/10.26434/chemrxiv-2024-zjnkd
2024-02-22
Abstract:Homogeneous catalysts enable faster conversions of molecules with higher selectivities (stereo- and regioselectivity) in chemical reactions. Traditionally, catalyst improvements are made through empirical trials, where the catalyst is functionalised by adding, removing or modifying groups within its structure and, subsequently, reevaluating the new catalytic activity. This procedure is not efficient and leads to unsuccessful trials that waste resources. Machine learning (ML) approaches have been proposed to accelerate homogeneous asymmetric catalyst optimization. However, these often lack a general descriptor generation procedure to allow encoding of molecules from a broad region of chemical space. To overcome this, we propose a homogeneous catalyst graph neural network (HCat-GNet) for the prediction of selectivity of catalysts given the SMILES of participant molecules. We demonstrate its use in rhodium-catalyzed asymmetric 1,4-addition (RhCAA), a reaction of major importance in organic synthesis. We benchmark HCat-GNet against traditional ML methods for its ability to predict RhCAA stereoselectivity from two chiral diene ligand two datasets; one for learning and one for final testing. For the learning dataset, both traditional ML and HCat-GNet methods give comparable results. However, when presented with the new unseen test dataset, traditional ML models perform poorly, while HCat-GNet retains a general ability to accurately predict product absolute stereochemistry and reaction stereoselectivity. Furthermore, HCat-GNet allows model interpretability, permitting analysis of the effect of ligand substituents in determining reaction selectivity. HCat-GNet shows greater potential for catalyst optimization than traditional ML, as it allows the use of a non-fixed number of participant molecules to train the model, only requiring the SMILES of the molecules to create graph representations. HCat-GNet allows more general models that accurately extrapolate into unseen regions of chemical space.
Chemistry
What problem does this paper attempt to address?
The paper focuses on a challenge in catalytic optimization, which is how to efficiently improve the selectivity of homogeneous catalysts to enhance chemical reactions. Traditional methods rely on trial and error, which is both uneconomical and may lead to resource waste. The paper proposes a new graph neural network model called HCat-GNet for predicting the catalyst selectivity of participating molecules based on SMILES encoding, especially in the Rhodium-catalyzed asymmetric 1,4-addition reactions (RhCAA). HCat-GNet has advantages over traditional machine learning methods in that it can handle a variable number of participant molecules and can generate general descriptors from a wide chemical space, allowing more accurate predictions for unseen chemical reactions. Furthermore, HCat-GNet is interpretable and can analyze the impact of ligand substitutions on reaction selectivity. Benchmark tests on two RhCAA datasets show that while traditional machine learning methods perform well on training data, their performance declines on new unseen test data, whereas HCat-GNet maintains good predictive accuracy and generalization ability. The paper validates HCat-GNet through a case study on RhCAA, using two datasets: one with traditional chiral diene ligands (learning/testing dataset), and another with newly reported structurally diverse ligands (unseen/final testing dataset) by Rit et al. The results demonstrate that HCat-GNet performs well in predicting the selectivity of reactions with new ligands, demonstrating its potential and transferability in catalytic optimization. In summary, the paper aims to address how to effectively and with better generalization ability predict and optimize the selectivity of homogeneous catalysts in catalytic reactions in chemical synthesis using machine learning, particularly graph neural networks.