Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Romain Lacombe,Andrew Gaut,Jeff He,David Lüdeke,Kateryna Pistunova

2023-07-22

Abstract:Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).

Machine Learning,Artificial Intelligence,Computation and Language,Information Retrieval,Quantitative Methods

What problem does this paper attempt to address?

The paper aims to address the issue of how to effectively utilize natural language descriptions to enhance the performance of molecular property prediction. Specifically, the researchers hope to improve the predictive ability of molecular properties by combining molecular graph representations with textual descriptions using a multimodal contrastive learning approach. To achieve this goal, the research team adopted the following strategies: 1. **Multimodal Contrastive Learning**: By using contrastive learning methods, molecular graph representations and related textual descriptions are aligned in the same latent space. This helps extract information about molecular properties from scientific literature and integrate it into the representation of molecular graphs. 2. **Relevance-Enhanced Text Sampling**: A neural network-based relevance scoring method is proposed to improve the text sampling process, ensuring that the selected text fragments are highly relevant to the molecular properties. 3. **Chemically Valid Graph Augmentation**: A new molecular graph augmentation strategy based on chemical reaction principles is introduced, generating augmented graphs that are chemically reasonable. Experimental results show that after adopting these improved strategies, performance on multiple molecular property prediction tasks has improved. On average, compared to models trained using only the graph modality, performance improved by 4.26% in AUROC; compared to the recently proposed MoMu model, performance improved by 1.54% in AUROC. These results indicate that information extracted from scientific literature has significant value in improving molecular graph representations and their property predictions.

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

DIG-Mol: A Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

A Novel Descriptor and Molecular Graph-Based Bimodal Contrastive Learning Framework for Drug Molecular Property Prediction.

Molecular contrastive learning of representations via graph neural networks

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding

Attention-wise masked graph contrastive learning for predicting molecular property

Molecular Representation Contrastive Learning Via Transformer Embedding to Graph Neural Networks

Cross‐Modal Graph Contrastive Learning with Cellular Images

MolPROP: Molecular Property prediction with multimodal language and graph fusion

Knowledge-aware Contrastive Molecular Graph Learning

Molecular Contrastive Learning with Chemical Element Knowledge Graph

Graph Multi-Similarity Learning for Molecular Property Prediction

Multilingual Molecular Representation Learning via Contrastive Pre-training

MoCL: Contrastive Learning on Molecular Graphs with Multi-level Domain Knowledge

Boosting the performance of molecular property prediction via graph–text alignment and multi-granularity representation enhancement

MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information