Graph Residual based Method for Molecular Property Prediction

Kanad Sen,Saksham Gupta,Abhishek Raj,Alankar Alankar
2024-10-07
Abstract:Machine learning-driven methods for property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time for critical applications. The traditional machine learning models predict properties based on the features extracted from the molecules, which are often not easily available. In this work, a novel Deep Learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, which constitutes the training data. This manuscript highlights a detailed description of the novel GRU-based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has been provided as well. The results have been compared with standard benchmark datasets as well as some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.
Quantitative Methods,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy, generalization ability and inference time of molecular property prediction. Traditional machine - learning models rely on features extracted from molecules for property prediction, and these features are often difficult to obtain. This paper proposes a new deep - learning method - Edge Conditioned Residual Graph Neural Network (ECRGNN), which directly uses the graph structure of molecules (based on SMILES representation) to predict molecular properties. This method not only improves the prediction accuracy, but also enhances the generalization ability of the model and reduces the inference time. Specifically, the main contributions of the paper include: 1. **Proposing the ECRGNN model**: This model improves the traditional graph neural network (GNN) by introducing Edge Conditioned Graph Convolution (ECC) and Gated Recurrent Unit (GRU). ECC allows the model to consider edge features (such as aromaticity, multiple bonds, etc.) during the message - passing process, while GRU is used to store and transmit the state information of nodes, thereby capturing long - range dependencies. 2. **Handling the dynamic characteristics of molecular graphs**: Since the size and shape of molecular graphs vary, the model will encounter molecules of different lengths during training and testing. By adding GRU layers after each "hop" and introducing residual connections between the first and third hops, the model can better adapt to molecular graphs of different sizes. 3. **Multi - task learning**: The model can not only perform regression tasks (such as predicting the physical and chemical properties of molecules), but also perform classification tasks (such as distinguishing between toxic and non - toxic compounds). The paper describes in detail the application of variational autoencoder (VAE) and end - - end learning methods in multi - label multi - classification tasks. 4. **Experimental verification**: The paper conducts experiments using standard benchmark datasets and newly developed datasets, compares the performance of the model on different tasks, and defines evaluation metrics and the reasons for their selection. In summary, this paper aims to improve the accuracy and efficiency of molecular property prediction and enhance the generalization ability of the model by introducing the ECRGNN model.