Molecular Property Prediction Based on Graph Structure Learning

Bangyi Zhao,Weixia Xu,Jihong Guan,Shuigeng Zhou
2023-12-28
Abstract:Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have made considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. For this sake, in this paper we propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. Then, with molecular fingerprints, we construct a molecular similarity graph (MSG). Following that, we conduct graph structure learning on the MSG (i.e., molecule-level graph structure learning) to get the final molecular embeddings, which are the results of fusing both GNN encoded molecular representations and the relationships among molecules, i.e., combining both intra-molecule and inter-molecule information. Finally, we use these molecular embeddings to perform MPP. Extensive experiments on seven various benchmark datasets show that our method could achieve state-of-the-art performance in most cases, especially on classification tasks. Further visualization studies also demonstrate the good molecular representations of our method.
Machine Learning,Biomolecules
What problem does this paper attempt to address?
The paper primarily aims to address a key challenge in Molecular Property Prediction (MPP), which is how to effectively utilize the relationships between molecules to improve prediction performance. The authors propose a Graph Structure Learning-based method (GSL-MPP), which not only considers the structural information within individual molecules but also constructs a Molecular Similarity Graph (MSG) to encode the similarity between molecules and further optimizes this similarity graph through graph structure learning. Specifically, the GSL-MPP method includes two levels of graph representation learning: 1. **Atom-level molecular graph representation**: Extracting the initial representation of each molecule through Graph Neural Networks (GNN). 2. **Molecule-level graph representation**: Constructing a Molecular Similarity Graph (MSG) and performing graph structure learning on this graph to iteratively improve the final molecular embeddings. In the Molecular Similarity Graph, nodes represent molecules, and edges represent the relationships between molecules. To construct this graph, the authors used Extended Connectivity Fingerprints (ECFP) to calculate the similarity between molecules and built the initial molecular similarity graph based on this. Subsequently, the graph is optimized through a graph structure learning process to better capture the interactions between molecules, thereby improving the accuracy of molecular property prediction. Experimental results show that GSL-MPP outperforms existing baseline models on multiple benchmark datasets, particularly demonstrating significant advantages in classification tasks. This indicates that utilizing the relationships between molecules and optimizing them through graph structure learning is an effective strategy that can significantly enhance the accuracy of molecular property prediction.