Abstract:Motivation: Effective molecular representation is critical in drug development. The complex nature of molecules demands comprehensive multi-view representations, considering 1D, 2D, and 3D aspects, to capture diverse perspectives. Obtaining representations that encompass these varied structures is crucial for a holistic understanding of molecules in drug-related contexts. Results: In this study, we introduce an innovative multi-view contrastive learning framework for molecular representation, denoted as MolMVC. Initially, we use a Transformer encoder to capture 1D sequence information and a Graph Transformer to encode the intricate 2D and 3D structural details of molecules. Our approach incorporates a novel attention-guided augmentation scheme, leveraging prior knowledge to create positive samples tailored to different molecular data views. To align multi-view molecular positive samples effectively in latent space, we introduce an adaptive multi-view contrastive loss (AMCLoss). In particular, we calculate AMCLoss at various levels within the model to effectively capture the hierarchical nature of the molecular information. Eventually, we pre-train the encoders via minimizing AMCLoss to obtain the molecular representation, which can be used for various down-stream tasks. In our experiments, we evaluate the performance of our MolMVC on multiple tasks, including molecular property prediction (MPP), drug-target binding affinity (DTA) prediction and cancer drug response (CDR) prediction. The results demonstrate that the molecular representation learned by our MolMVC can enhance the predictive accuracy on these tasks and also reduce the computational costs. Furthermore, we showcase MolMVC's efficacy in drug repositioning across a spectrum of drug-related applications. Availability and implementation: The code and pre-trained model are publicly available at https://github.com/Hhhzj-7/MolMVC.

MolLM : a unified language model for integrating biomedical text with 2D and 3D molecular representations

Towards 3D Molecule-Text Interpretation in Language Models

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

MolFM: A Multimodal Molecular Foundation Model

MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

MolBind: Multimodal Alignment of Language, Molecules, and Proteins

MolTC: Towards Molecular Relational Modeling In Language Models

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Uni-Mol: A Universal 3D Molecular Representation Learning Framework

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

Multilingual Molecular Representation Learning via Contrastive Pre-training

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning

Large-scale chemical language representations capture molecular structure and properties

MolXPT: Wrapping Molecules with Text for Generative Pre-training

UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation