LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space

Jinho Chang,Jong Chul Ye
2024-10-03
Abstract:With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space, and a natural language-conditioned latent diffusion model. In particular, recognizing that multiple SMILES notations can represent the same molecule, we employ a contrastive learning strategy to extract feature space that is aware of the unique characteristics of the molecule structure. LDMol outperforms the existing baselines on the text-to-molecule generation benchmark, suggesting a potential for diffusion models can outperform autoregressive models in text data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address a key issue in molecular generation: how to utilize diffusion models to generate valid molecular structures given natural language conditions. Specifically: 1. **The Discreteness of Molecular Data**: Molecular data inherently possess discreteness (such as atom types, bond types, and connectivity), while existing diffusion models are primarily studied for continuous data domains (like images). This makes it challenging to directly apply diffusion models to molecular generation. 2. **Generation from Text to Molecules**: Currently, most diffusion model-based methods can only handle relatively simple chemical or biological conditions, and their ability to generate molecules under natural language conditions is weak. In contrast, autoregressive models perform better in this task. To overcome these issues, the authors propose a new Latent Space Diffusion Model (LDMol) for text-conditioned molecular generation. LDMol includes a molecular autoencoder that can generate a feature space rich in structural information and incorporates a latent space diffusion model based on natural language conditions. Additionally, a contrastive learning strategy is used to extract features of molecular structures, enabling the model to better understand and generate valid molecules that meet the given text conditions. Experimental results show that LDMol outperforms existing autoregressive models in text-conditioned molecular generation benchmarks and demonstrates its potential applications in downstream tasks such as molecule-to-text retrieval and text-guided molecular editing.