Molecular Property Prediction by Contrastive Learning with Attention-Guided Positive Sample Selection

Jinxian Wang,Jihong Guan,Shuigeng Zhou
DOI: https://doi.org/10.1093/bioinformatics/btad258
IF: 5.8
2023-04-20
Bioinformatics
Abstract:Abstract Motivation Predicting molecular properties is one of the fundamental problems in drug design and discovery. In recent years, self-supervised learning has shown its promising performance in image recognition, natural language processing, and single-cell data analysis. Contrastive learning is a typical self-supervised learning method used to learn the features of data so that the trained model can more effectively distinguish the data. One important issue of contrastive learning is how to select positive samples for each training example, which will significantly impact the performance of contrastive learning. Results In this paper, we propose a new method for molecular property prediction by Contrastive Learning with Attention-guided Positive-sample Selection (CLAPS). Firstly, we generate positive samples for each training example based on an attention-guided selection scheme. Secondly, we employ a Transformer encoder to extract latent feature vectors and compute the contrastive loss aiming to distinguish positive and negative sample pairs. Finally, we use the trained encoder for predicting molecular properties. Experiments on various benchmark datasets show that our approach outperforms the state-of-the-art (SOTA) methods in most cases. Availability The code is publicly available at https://github.com/wangjx22/CLAPS. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?