Abstract:One of the major applications of generative models for drug discovery targets the lead-optimization phase. During the optimization of a lead series, it is common to have scaffold constraints imposed on the structure of the molecules designed. Without enforcing such constraints, the probability of generating molecules with the required scaffold is extremely low and hinders the practicality of generative models for de novo drug design. To tackle this issue, we introduce a new algorithm, named SAMOA (Scaffold Constrained Molecular Generation), to perform scaffold-constrained in silico molecular design. We build on the well-known SMILES-based Recurrent Neural Network (RNN) generative model, with a modified sampling procedure to achieve scaffold-constrained generation. We directly benefit from the associated reinforcement learning methods, allowing to design molecules optimized for different properties while exploring only the relevant chemical space. We showcase the method's ability to perform scaffold-constrained generation on various tasks: designing novel molecules around scaffolds extracted from SureChEMBL chemical series, generating novel active molecules on the Dopamine Receptor D2 (DRD2) target, and finally, designing predicted actives on the MMP-12 series, an industrial lead-optimization project.The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.jcim.0c01015?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.jcim.0c01015</a>.Availability of data used to train models and run experiments, algorithmic description of naive policy masking (Algorithm S1), algorithmic description of hill-climbing (Algorithm S2), examples of molecules obtained after sampling the core of fexofenadine (Figure S1 for sampled molecules, Figure S2 for the original molecule), examples of molecules obtained after replacing the ring system of fexofenadine (Figure S3), distributions of ClogP, Molecular Weight, QED and SAS for reference molecules, molecules where a branched decoration was sampled, and molecules where the core was sampled (Figure S4), the 17 validation scaffolds from SureChEMBL (Figure S5), histograms of Tanimoto similarities between generated molecules for each of the DRD2 scaffolds (Figure S6), and unicity of sampled decorations for each open positions of the DRD2 scaffolds (Figure S7) (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c01015/suppl_file/ci0c01015_si_001.pdf">PDF</a>)This article has not yet been cited by other publications.

Gotta be SAFE: A New Framework for Molecular Design

SAFE setup for generative molecular design

Unlocking comprehensive molecular design across all scenarios with large language model and unordered chemical language

PrefixMol: Target- and Chemistry-aware Molecule Design Via Prefix Embedding

t-SMILES: a fragment-based molecular representation framework for de novo ligand design

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

fragSMILES: a Chemical String Notation for Advanced Fragment and Chirality Representation

SMILES-based deep generative scaffold decorator for de-novo drug design

Fragment-Based Ligand Generation Guided By Geometric Deep Learning On Protein-Ligand Structure

Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning

t-SMILES: A Scalable Fragment-based Molecular Representation Framework for De Novo Molecule Generation

Learning to Extend Molecular Scaffolds with Structural Motifs

Scaffold-based molecular design using graph generative model

PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models

Multi-Objective Molecular Design in Constrained Latent Space

FragGen: Towards 3D Geometry Reliable Fragment-based Molecular Generation

Scaffold-Constrained Molecular Generation

Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models

Generative AI-Driven Molecular Design: Combining Predictive Models and Reinforcement Learning for Tailored Molecule Generation

Molecular generation by Fast Assembly of (Deep)SMILES fragments

3DSMILES-GPT: 3D Molecular Pocket-based Generation with Token-only Large Language Model