Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction

Ao Shen,Mingzhi Yuan,Yingfan Ma,Jie Du,Manning Wang

DOI: https://doi.org/10.1093/bib/bbae256

IF: 9.5

2024-05-29

Briefings in Bioinformatics

Abstract:Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.

biochemical research methods,mathematical & computational biology

What problem does this paper attempt to address?

This paper mainly discusses how to use complementary multimodal molecular self-supervised learning to predict chemical properties. Existing molecular pre-training methods mostly focus on one modality of data, while this paper proposes a multimodal self-supervised learning framework for SMILES sequences and molecular graphs. Through a non-overlapping masking strategy, this framework promotes fine-grained interaction between the two modalities to improve prediction performance. Experimental results show that this method achieves state-of-the-art performance in a range of molecular property prediction tasks, and the efficacy of the multimodal framework and masking strategy is verified through ablation studies. The paper points out that SMILES and graph modalities provide complementary information and their combined use can enhance molecular representation performance. Therefore, they design a non-overlapping masking strategy, which ensures that the masked parts in SMILES and graph do not overlap during pre-training, thereby encouraging the model to use direct information from the other modality to reconstruct the masked parts, enhancing the interaction between modalities.

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction

Attention-wise masked graph contrastive learning for predicting molecular property

Graph Multi-Similarity Learning for Molecular Property Prediction

Masked Molecule Modeling: A New Paradigm of Molecular Representation Learning for Chemistry Understanding

Self-Supervised Molecular Representation Learning With Topology and Geometry

Multimodal Molecular Pretraining via Modality Blending

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures

Hierarchical Molecular Graph Self-Supervised Learning for property prediction

Predicting Chemical Properties using Self-Attention Multi-task Learning based on SMILES Representation

Molecular Joint Representation Learning via Multi-modal Information of SMILES and Graphs

A merged molecular representation learning for molecular properties prediction with a web-based service

Multimodal Fusion with Relational Learning for Molecular Property Prediction

Boosting the performance of molecular property prediction via graph–text alignment and multi-granularity representation enhancement

Molecular Joint Representation Learning via Multi-modal Information

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

MPCD: A Multitask Graph Transformer for Molecular Property Prediction by Integrating Common and Domain Knowledge

EMPPNet: Enhancing Molecular Property Prediction via Cross-modal Information Flow and Hierarchical Attention

MvMRL: a multi-view molecular representation learning method for molecular property prediction

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph

UniMAP: Universal SMILES-Graph Representation Learning