Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model

Yida Xiong,Kun Li,Weiwei Liu,Jia Wu,Bo Du,Shirui Pan,Wenbin Hu

2024-10-17

Abstract:Molecular optimization (MO) is a crucial stage in drug discovery in which task-oriented generated molecules are optimized to meet practical industrial requirements. Existing mainstream MO approaches primarily utilize external property predictors to guide iterative property optimization. However, learning all molecular samples in the vast chemical space is unrealistic for predictors. As a result, errors and noise are inevitably introduced during property prediction due to the nature of approximation. This leads to discrepancy accumulation, generalization reduction and suboptimal molecular candidates. In this paper, we propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM). TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions, thereby preventing error propagation during diffusion process. Guided by physically and chemically detailed textual descriptions, TransDLM samples and optimizes encoded source molecules, retaining core scaffolds of source molecules and ensuring structural similarities. Moreover, TransDLM enables simultaneous sampling of multiple molecules, making it ideal for scalable, efficient large-scale optimization through distributed computation on web platforms. Furthermore, our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset. The code is available at: <a class="link-external link-https" href="https://anonymous.4open.science/r/TransDLM-A901" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively optimize the generated molecules to meet the actual industrial needs in the process of drug discovery. The existing mainstream molecular optimization methods mainly rely on external property predictors to guide the iterative property optimization process. However, due to the large and complex chemical space, it is unrealistic for the predictor to learn all molecular samples, which inevitably introduces errors and noise in property prediction. These errors and noise will accumulate, affecting the quality of the optimization results and leading to sub - optimal molecular candidates. In addition, traditional molecular optimization methods mainly rely on chemists' experience, knowledge and intuition, which makes the process time - consuming and difficult to find the ideal molecule within a limited time. To address these challenges, this paper proposes a text - guided multi - attribute molecular optimization method based on the diffusion language model (TransDLM). TransDLM uses standardized chemical nomenclature as the semantic representation of molecules and implicitly embeds property requirements into text descriptions, thereby preventing error propagation during the diffusion process. Guided by detailed physical and chemical text descriptions, TransDLM can sample and optimize the encoded source molecules, retain the core skeleton of the source molecules and ensure structural similarity. Moreover, TransDLM supports sampling multiple molecules simultaneously, is suitable for large - scale optimization through distributed computing on network platforms, and improves the efficiency and scalability of optimization. Experimental results show that TransDLM is superior to existing methods in optimizing molecular structure similarity and enhancing chemical properties on the benchmark dataset.

Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model

Leveraging language model for advanced multiproperty molecular optimization via prompt engineering

XMOL: Explainable Multi-property Optimization of Molecules

Text-Guided Molecule Generation with Diffusion Language Model

Balancing property optimization and constraint satisfaction for constrained multi-property molecular optimization

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Fast and Effective Molecular Property Prediction with Transferability Map

Optimizing molecules using efficient queries from property evaluations

Text-guided Small Molecule Generation Via Diffusion Model

Text-guided Diffusion Model for 3D Molecule Generation

Controlled Molecule Generator for Optimizing Multiple Chemical Properties

TransFoxMol: predicting molecular property with focused attention

Sedimentation velocity analysis of flexible macromolecules: self-association and tangling of amyloid fibrils.

Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back

Scalable Multi-Task Transfer Learning for Molecular Property Prediction

MolCloze - A Unified Cloze-style Self-supervised Molecular Structure Learning Model for Chemical Property Prediction.

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

DrugAssist: A Large Language Model for Molecule Optimization

Training-free Multi-objective Diffusion Model for 3D Molecule Generation

Meta Learning for Low-Resource Molecular Optimization

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction