CTAGE: Curvature-Based Topology-Aware Graph Embedding for Learning Molecular Representations

Yili Chen,Zhengyu Li,Zheng Wan,Hui Yu,Xian Wei
2024-01-18
Abstract:AI-driven drug design relies significantly on predicting molecular properties, which is a complex task. In current approaches, the most commonly used feature representations for training deep neural network models are based on SMILES and molecular graphs. While these methods are concise and efficient, they have limitations in capturing complex spatial information. Recently, researchers have recognized the importance of incorporating three-dimensional information of molecular structures into models. However, capturing spatial information requires the introduction of additional units in the generator, bringing additional design and computational costs. Therefore, it is necessary to develop a method for predicting molecular properties that effectively combines spatial structural information while maintaining the simplicity and efficiency of graph neural networks. In this work, we propose an embedding approach CTAGE, utilizing $k$-hop discrete Ricci curvature to extract structural insights from molecular graph data. This effectively integrates spatial structural information while preserving the training complexity of the network. Experimental results indicate that introducing node curvature significantly improves the performance of current graph neural network frameworks, validating that the information from k-hop node curvature effectively reflects the relationship between molecular structure and function.
Machine Learning,Artificial Intelligence,Quantitative Methods
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to effectively combine spatial structure information and maintain the simplicity and high - efficiency of graph neural networks (GNN) when predicting molecular properties in drug design. Specifically: 1. **Limitations of existing methods**: - The feature representations currently used to train deep neural network models are mainly based on SMILES strings and molecular graphs. Although these methods are concise and efficient, they have limitations in capturing complex three - dimensional spatial information. - Introducing three - dimensional information is helpful for improving model performance, but it requires additional design and computational costs. 2. **Research objectives**: - Develop a method that can effectively combine the spatial structure information of molecules without significantly increasing the computational complexity, so as to improve the accuracy of molecular property prediction. - Propose a curvature - based topologically - aware graph embedding method (CTAGE), which uses discrete Ricci curvature to extract structural information from molecular graph data, thereby enhancing the learning ability of graph neural networks for complex molecular graphs. ### Main contributions of the paper - **Introducing k - hop discrete Ricci curvature**: By calculating the node curvature at different hop numbers, CTAGE can effectively capture the complex spatial information in the molecular structure while keeping the complexity of network training unchanged. - **Experimental verification**: The experimental results show that the introduction of node curvature significantly improves the performance of the current graph neural network framework, proving that the k - hop node curvature information can effectively reflect the relationship between molecular structure and function. ### Formula display - **Forman - Ricci curvature calculation formula**: \[ F(e)=w_e\left(\frac{w_{v_i}}{w_e + w_{v_i}}+\frac{w_{v_j}}{w_e + w_{v_j}}\right)-w_e\left(\sum_{e_{v_i}\sim e,e_{v_j}\sim e}\left[\sqrt{\frac{w_e}{w_e w_{e_{v_i}}}+\frac{w_e}{w_e w_{e_{v_j}}}}\right]\right) \] where \(e\) is an edge, \(v_i\) and \(v_j\) are the two nodes connecting the edge \(e\), \(w_e\) is the weight of the edge \(e\), and \(w_{v_i}\) and \(w_{v_j}\) are the weights of the nodes \(v_i\) and \(v_j\). - **Node curvature calculation formula**: \[ F(v)=\frac{1}{\text{deg}(v)}\sum_{e\sim v}F(e) \] where \(\text{deg}(v)\) is the degree of the node \(v\). - **Negative curvature conversion formula**: \[ F(v)=\left(\frac{\text{cur}(v)-\text{cur}_{\min}}{\text{cur}_{\max}-\text{cur}_{\min}}\right) \] Through these methods, CTAGE can better capture the complex spatial information in the molecular structure while maintaining high efficiency, thereby improving the accuracy of molecular property prediction.