Atomic and Subgraph-aware Bilateral Aggregation for Molecular Representation Learning

Jiahao Chen,Yurou Liu,Jiangmeng Li,Bing Su,Jirong Wen
2023-05-22
Abstract:Molecular representation learning is a crucial task in predicting molecular properties. Molecules are often modeled as graphs where atoms and chemical bonds are represented as nodes and edges, respectively, and Graph Neural Networks (GNNs) have been commonly utilized to predict atom-related properties, such as reactivity and solubility. However, functional groups (subgraphs) are closely related to some chemical properties of molecules, such as efficacy, and metabolic properties, which cannot be solely determined by individual atoms. In this paper, we introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA), which addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information. ASBA consists of two branches, one for atom-wise information and the other for subgraph-wise information. Considering existing atom-wise GNNs cannot properly extract invariant subgraph features, we propose a decomposition-polymerization GNN architecture for the subgraph-wise branch. Furthermore, we propose cooperative node-level and graph-level self-supervised learning strategies for ASBA to improve its generalization. Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications. Extensive experiments have demonstrated the effectiveness of our method.
Machine Learning,Artificial Intelligence,Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in molecular property prediction, existing methods usually only focus on information at the atomic level or information at the functional group (sub - graph) level, and fail to consider the influence of these two aspects of information on molecular properties simultaneously. Specifically: 1. **Atomic - level models**: These models predict atom - related properties, such as reactivity and solubility, through graph neural networks (GNNs). However, they cannot capture some chemical properties determined by functional groups, such as drug efficacy and metabolic properties. 2. **Functional - group - level models**: These models focus on functional groups (sub - graphs), which are crucial for certain chemical properties. However, these models often overlook the influence of individual atoms, which is harmful when predicting atom - related properties. To overcome the limitations of the above models, the paper proposes a two - branch aggregation model - Atomic and Subgraph - aware Bilateral Aggregation (ASBA). The ASBA model combines information at the atomic level and the functional - group level, aiming to learn molecular representations more comprehensively, thereby improving the accuracy of molecular property prediction. Specific contributions include: 1. **Two - branch model**: The ASBA model contains two branches, one for modeling information at the atomic level and the other for modeling information at the functional - group level. For the functional - group branch, a decomposition - aggregation GNN architecture is proposed, which can independently embed each functional group and aggregate the representations of functional groups to form the final molecular representation. 2. **Self - supervised learning method**: A self - supervised learning method that cooperates at the node level and the graph level is proposed for jointly pre - training the two branches of ASBA. In particular, for the functional - group branch, a Masked Subgraph - Token Modeling (MSTM) strategy is proposed. By using automatically discovered functional groups as tokens and predicting the masked token indices, it can better capture information at the functional - group level and their relationships. 3. **Experimental verification**: The effectiveness of the ASBA model and its self - supervised learning method has been verified through extensive experiments, especially showing stronger generalization ability in various molecular property prediction tasks related to functional groups. In summary, this paper aims to provide a more comprehensive and accurate method for molecular property prediction by combining information at the atomic level and the functional - group level.