Bidirectional generation of structure and properties through a single molecular foundation model

Jinho Chang,Jong Chul Ye

DOI: https://doi.org/10.1038/s41467-024-46440-3

IF: 16.6

2024-03-14

Nature Communications

Abstract:Abstract Recent successes of foundation models in artificial intelligence have prompted the emergence of large-scale chemical pre-trained models. Despite the growing interest in large molecular pre-trained models that provide informative representations for downstream tasks, attempts for multimodal pre-training approaches on the molecule domain were limited. To address this, here we present a multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties, drawing inspiration from recent advances in multimodal learning techniques. Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space, which enables the model to regard bidirectional information between the molecules’ structure and properties. These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model. Through extensive experiments, we demonstrate that our model has the capabilities to solve various meaningful chemical challenges, including conditional molecule generation, property prediction, molecule classification, and reaction prediction.

multidisciplinary sciences

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper proposes a multimodal chemical foundation model based on the Transformer architecture (SPMM), aiming to address the bidirectional generation and prediction problem between molecular structures and their properties, and to achieve multiple downstream tasks within a single model. Specifically, the model jointly trains molecular structures (SMILES) and their biochemical properties (Property Vector, PV) as two modalities to achieve the following goals: 1. **Bidirectional Generation and Prediction**: - **Generation from SMILES to Properties**: Given a molecular structure, predict its properties. - **Generation from Properties to SMILES**: Generate the corresponding molecular structure based on the desired property conditions. 2. **Multimodal Learning**: - Utilize the self-attention and cross-attention mechanisms in the Transformer architecture to extract the relationships between molecular structures and properties. - Align different modal features into the same embedding space through contrastive learning, improving model performance. 3. **Single-Modal Tasks**: - Perform tasks such as property prediction and molecular classification using only molecular structures. - Perform molecular generation tasks using only properties. The model demonstrates outstanding performance in various chemical challenges, including conditional molecular generation, property prediction, molecular classification, and reaction prediction tasks. Extensive experiments validate the model's effectiveness and generalization ability. Additionally, the model was pre-trained using only 50 million molecules, indicating significant room for improvement compared to other large pre-trained models.

Bidirectional generation of structure and properties through a single molecular foundation model

PrefixMol: Target- and Chemistry-aware Molecule Design Via Prefix Embedding

Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model

A Large Encoder-Decoder Family of Foundation Models For Chemical Language

Multimodal Molecular Pretraining via Modality Blending

Flexible Dual-Branched Message-Passing Neural Network for a Molecular Property Prediction

Predicting Structure‐dependent Properties Directly from the Three Dimensional Molecular Images Via Convolutional Neural Networks

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures

Multimodal Fusion with Relational Learning for Molecular Property Prediction

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph

Multi-view biomedical foundation models for molecule-target and property prediction

Improving Molecular Properties Prediction Through Latent Space Fusion

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

MolPROP: Molecular Property prediction with multimodal language and graph fusion

MolCloze - A Unified Cloze-style Self-supervised Molecular Structure Learning Model for Chemical Property Prediction.

Large-scale chemical language representations capture molecular structure and properties

Chemical Language Model Linker: blending text and molecules with modular adapters

Automated 3D Pre-Training for Molecular Property Prediction