Improving Structural Plausibility in 3D Molecule Generation via Property-Conditioned Training with Distorted Molecules

Lucy Vost,Vijil Chenthamarakshan,Payel Das,Charlotte M Deane
DOI: https://doi.org/10.1101/2024.09.17.613136
2024-09-21
Abstract:Traditional drug design methods are costly and time-consuming due to their reliance on trial-and-error processes. As a result, computational methods, including diffusion models, designed for molecule generation tasks have gained significant traction. Despite their potential, they have faced criticism for producing physically implausible outputs. We alleviate this problem by conditionally training a diffusion model capable of generating molecules of varying and controllable levels of chemical plausibility. This is achieved by adding distorted molecules to training datasets, and then annotating each molecule with a label representing the extent of its distortion, and hence its quality. By training the model to distinguish between favourable and unfavourable molecular conformations alongside the standard molecule generation training process, we can selectively sample molecules from the high-quality region of learned space, resulting in improvements in the validity of generated molecules. In addition to the standard two datasets used by molecule generation methods (QM9 and GEOM), we also test our method on a druglike dataset derived from ZINC. We use our conditional method with EDM, the first E(3) equivariant diffusion model for molecule generation, as well as two further models—a more recent diffusion model and a flow matching model—which were built off EDM. We demonstrate improvements in validity as assessed by RDKit parsability and the PoseBusters test suite; more broadly, though, our findings highlight the effectiveness of conditioning methods on low-quality data to improve the sampling of high-quality data.
Bioinformatics
What problem does this paper attempt to address?
The paper attempts to address the issue of improving structural rationality when generating 3D molecules. Traditional drug design methods are costly and time-consuming, relying on a trial-and-error process. To overcome these issues, researchers have developed computational methods, including diffusion models, to generate molecules. Although these models have potential, the molecules they generate are often physically infeasible. To solve this problem, the authors propose a method of conditionally training diffusion models to generate 3D drug-like molecules with different levels of structural rationality. Specifically, by adding distorted molecules to the training dataset and labeling their degree of distortion, the model can not only generate molecules but also distinguish between high-quality and low-quality chemical structures. This approach improves the validity of the generated molecules and is applicable to different datasets (such as QM9, GEOM, and ZINC) and models (such as EDM, GCDM, and EquiFM). In this way, the researchers demonstrate how conditionally training models can improve the quality of generated molecules.