Abstract:Deep learning has been a prevalence in computational chemistry and widely implemented in molecule property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), gathers growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multi-level graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs) in two aspects, (1) mitigating faulty negative contrastive instances via considering cheminformatics similarities between molecule pairs; (2) fragment-level contrasting between intra- and inter-molecule substructures decomposed from molecules. Experiments have shown that the proposed strategies significantly improve the performance of GNN models on various challenging molecular property predictions. In comparison to the previous CL framework, iMolCLR demonstrates an averaged 1.3% improvement of ROC-AUC on 7 classification benchmarks and an averaged 4.8% decrease of the error on 5 regression benchmarks. On most benchmarks, the generic GNN pre-trained by iMolCLR rivals or even surpasses supervised learning models with sophisticated architecture designs and engineered features. Further investigations demonstrate that representations learned through iMolCLR intrinsically embed scaffolds and functional groups that can reason molecule similarities.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two aspects: 1. **Faulty Negative Mitigation**: - In molecular contrastive learning, the traditional contrastive loss function assumes that all negative example pairs are equivalent to the anchor molecule, which leads to the problem of "faulty negative examples". Faulty negative examples refer to those molecular instances that are actually similar to the anchor molecule but are wrongly regarded as negative examples. These faulty negative examples will damage the robustness and performance of the contrastive - learning pre - trained model in downstream property prediction tasks. - The paper proposes to reduce the impact of faulty negative examples by considering the chemical - information similarity between molecule pairs, thereby improving the generalization ability and prediction accuracy of the model. 2. **Fragment - Level Contrast**: - Most molecular contrastive learning frameworks mainly conduct contrastive training at the overall level of the molecular graph, ignoring the contrast between different fragments within the molecule. These fragments contain important functional groups and are crucial for various properties of the molecule. - The paper introduces a method based on BRICS decomposition to decompose molecules into different fragments and conduct contrastive training between these fragments. This can preserve the main structural features of the compound and force the molecular representation to distinguish the key functional groups within the molecule, thereby improving the performance of molecular property prediction. Through these two aspects of improvement, the paper proposes the iMolCLR framework, aiming to enhance the effect of molecular contrastive learning. Especially when dealing with large - scale unlabeled molecular data, it can learn more general and effective molecular representations, so as to achieve better performance in various molecular property prediction tasks. Experimental results show that iMolCLR significantly outperforms existing supervised learning and self - supervised learning methods in multiple classification and regression benchmark tests.

Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

Molecular contrastive learning of representations via graph neural networks

Molecular Representation Contrastive Learning Via Transformer Embedding to Graph Neural Networks

3D graph contrastive learning for molecular property prediction

MoCL: Contrastive Learning on Molecular Graphs with Multi-level Domain Knowledge

MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph

Debiased Graph Contrastive Learning.

GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction

Molecular Graph Contrastive Learning with Parameterized Explainable Augmentations

DGCL: dual-graph neural networks contrastive learning for molecular property prediction

Molecular Contrastive Learning with Chemical Element Knowledge Graph

GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Look in The Mirror: Molecular Graph Contrastive Learning with Line Graph

Knowledge-aware Contrastive Molecular Graph Learning

Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

DIG-Mol: A Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

Molecular Property Prediction by Semantic-invariant Contrastive Learning

A 3D-Shape Similarity-based Contrastive Approach to Molecular Representation Learning

CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction