Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties

Siyuan Guo,Jihong Guan,Shuigeng Zhou
2023-10-05
Abstract:In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.
Biomolecules,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the limitations present in current molecular generation models, where most existing models only pursue the basic properties of generated molecules (such as chemical validity, uniqueness, etc.), and few models can simultaneously optimize multiple important molecular properties (e.g., Quantitative Estimate of Drug-likeness (QED) and penalized logP value (PlogP)). Therefore, the goal of the paper is to propose a new method to generate molecules with multiple desired properties. Specifically, the main contributions of the paper include: 1. **Multi-attribute optimization**: A novel method is proposed to generate molecules with multiple desired properties, marking the first attempt to simultaneously optimize multiple properties in generated molecules. 2. **Dual-level diffusion model**: The diffusion model framework is improved by diffusing at two structural levels (whole molecule and molecular fragments) in the form of a mixture of Gaussian distributions. 3. **Molecular fragmentation method based on electronic effects**: A new molecular fragmentation method is developed, based on electronic effects, to select fragments closely related to molecular properties. 4. **Multi-objective optimization strategy**: Two strategies are employed to optimize multiple properties of generated molecules. One is to use an energy-guided function to optimize the validity of molecules; the other is to use a multi-objective mechanism to simultaneously optimize multiple properties. Experimental results show that the proposed model outperforms the current state-of-the-art models on benchmark datasets QM9 and ZINC250K, achieving 100% in basic properties such as validity and uniqueness, and significantly outperforming other models in key properties such as QED and PlogP. This demonstrates the effectiveness of the new method in generating high-quality molecules with multiple desired properties.