Efficient Integration of Molecular Representation and Message-Passing Neural Networks for Predicting Small Molecule Drug-like Properties

Shreyas Bhat Brahmavar,Mrunmay Mohan Shelar,Revanth Harinarthini,Bandaru Hemanth Sai Krishna,Nahush Harihar Kumta,Ojas Wadhwani,Raviprasad Aduri
DOI: https://doi.org/10.26434/chemrxiv-2024-jj94j
2024-03-15
Abstract:The physicochemical properties of a drug molecule determine its metabolism properties. There have been hybrid quantum mechanics approaches with computer-aided drug design and recent supervised machine-learning approaches to predict these properties of small-molecule drugs. However, these methods are low in accuracy and computationally expensive. To get around this problem and improve the performance of a model that predicts the properties of drug molecules, we came up with a novel architecture that uses a "bond order matrix" and structural information to improve molecular graph representations and information in the molecule. Message-passing neural networks (MPNNs) are a framework used to learn local and global features from irregularly formed data invariant to permutations. We take advantage of MPNN architecture and introduce a “semi-master node,” a unique way of representing the functional groups in a small molecule and aggregating features obtained from the functional groups, in anticipation of reverse engineering small molecules given the desired physicochemical properties. This novel architecture and molecule representation were evaluated on the QM9 dataset, which has 133,000 stable small organic molecules with nine heavy atoms (CONF) out of the GDB-17 chemical universe. The metric for evaluating the model's performance is DFT error, an estimated average error of the properties of each molecule. Our models have shown a performance gain of ~10%.
Chemistry
What problem does this paper attempt to address?
The main focus of this paper is how to predict the physical and chemical properties of small molecule drugs more effectively. Current methods are limited in accuracy and computationally expensive. To address this issue, the researchers propose a new architecture that combines "bond sequence matrix" and "semi-major nodes" in a message passing neural network (MPNN). The bond sequence matrix is a molecular representation method that fully retains information about the types and properties of chemical bonds, while the semi-major nodes are used to integrate features of functional groups to reverse engineer small molecules with specific physical and chemical properties. The main contributions of this paper are as follows: 1. Design of an enhanced molecular input representation, namely the bond sequence matrix, to ensure that information about bond properties and types is not lost. 2. Introduction of semi-major nodes in the MPNN architecture to input structural information such as cyclic structures and aromaticity, which have an impact on many physical and chemical properties (such as dipole moments and potential energy). 3. This improved molecular representation method enhances model efficiency and reduces computation time and memory requirements. The paper is evaluated on the QM9 dataset, which is a database containing 133,000 stable small organic molecules with various physical and chemical properties. The new model shows an improvement of approximately 10% in predictive performance based on DFT error as the performance metric. Through this approach, the researchers aim to improve the accuracy and computational efficiency of predicting drug molecule properties, which is crucial for the drug design and development process.