Self-Supervised Graph Information Bottleneck for Multiview Molecular Embedding Learning

Shiye Wang,Ye Yuan,Kaihang Mao,Changsheng Li,Guoren Wang
DOI: https://doi.org/10.1109/TAI.2023.3297576
2024-04-01
IEEE Transactions on Artificial Intelligence
Abstract:In the field of computer-aided drug discovery, identifying promising drug candidates from small molecule libraries requires meaningful molecular embeddings for downstream tasks, such as property prediction. However, obtaining experimentally determined molecular property measurements is often expensive and time-consuming, making it challenging to train molecular encoders with limited supervision. In addition, molecules can be represented in two ways: as 2-D chemical-bond structures and 3-D geometry structures. Molecular embedding learning using only one of these representations can result in information loss, and effective fusion of the two views has not been fully explored. To address these challenges, we propose a new approach called the self-supervised multiview graph neural network (SMV-GNN) for molecular embedding learning. Our approach involves a self-supervised task that promotes the representation ability of the molecular encoder without requiring extra human-annotation data. Specifically, we use chemical-bond-based graph structures as inputs to predict interatom distances from the 2-D view and randomly shuffle a ratio of atoms in the 3-D coordinate-based graphs to predict atom rationality from the 3-D view. We further improve the representation ability of the molecular embedding by using information bottleneck to learn essential shared feature representations by discarding superfluous information from the 2-D/3-D views for downstream tasks. We evaluate our proposed SMV-GNN approach on seven benchmark datasets for molecule property-prediction tasks, and demonstrate that it outperforms the current state-of-the-art methods.
Computer Science,Medicine,Chemistry
What problem does this paper attempt to address?