Abstract:In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{<a class="link-external link-https" href="https://github.com/acellera/acegen-open" rel="external noopener nofollow">this https URL</a>} and available for use under the MIT license.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges of molecule generation and optimization in drug design. Specifically, it attempts to improve the reinforcement - learning (RL) - based molecule - generation methods by introducing a new toolkit named ACEGEN. The following are the main problems that the paper tries to solve: 1. **Balance between complexity and efficiency**: - Existing RL algorithms often face the challenge of balancing flexibility, reliability, and efficiency when applied to drug design. Due to the complexity of advanced RL algorithms and high dependence on specialized code, achieving efficient and reliable molecule generation and optimization becomes very challenging. 2. **Effective exploration of chemical space**: - Drug design requires finding molecules with optimal properties from a vast chemical space. Traditional methods have difficulty effectively searching this huge space because of its large scale and inability to be simply enumerated. RL provides a potential solution, but how to efficiently explore the chemical space remains a key issue. 3. **Design of customized reward functions**: - In drug design, the goals are often multifaceted (such as potency, selectivity, bioavailability, and toxicity), while existing scoring functions can usually only approximately define these goals. Designing a reward function that can reflect real - world requirements and guide RL algorithms is an important challenge. 4. **Limitations of existing RL implementations**: - Current RL implementations for drug discovery are highly dependent on custom - made code, which leads to redundancy, complexity, and inefficiency, limiting the integration of diverse solutions. The paper proposes to use pre - tested and reusable components in the TorchRL library to build more efficient drug - design agents. ### ACEGEN's solutions To address the above challenges, the paper proposes ACEGEN, a comprehensive and simplified toolkit built on TorchRL, specifically for generative drug design. The main features of ACEGEN include: - **Modularity and reusability**: Utilize the well - tested and reusable components provided by TorchRL to make development more flexible and efficient. - **Support for multiple RL algorithms**: Implement multiple RL algorithms (such as REINFORCE, A2C, PPO, etc.), and demonstrate their performance through benchmark tests. - **Diversity - generation modes**: Support multiple generation modes such as de novo generation, scaffold modification, and fragment ligation to meet different drug - design requirements. - **Chemical - language models**: Use chemical - language models (CLMs) for molecule generation and provide multiple string representations (such as SMILES, DeepSMILES, SELFIES, etc.). - **Scoring and evaluation functions**: Allow users to define custom - made scoring functions and integrate a wide range of scoring functions and diversity filters in the MolScore library. Through these improvements, ACEGEN aims to enhance the application effect of RL in drug design, especially in terms of sample efficiency, maximum performance, and exploration ability.

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

Sample Efficient Reinforcement Learning with Active Learning for Molecular Design

Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning

De novo Drug Design using Reinforcement Learning with Multiple GPT Agents

ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation

Diversity oriented Deep Reinforcement Learning for targeted molecule generation

DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space.

Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation

ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery

ClickGen: Directed Exploration of Synthesizable Chemical Space Via Modular Reactions and Reinforcement Learning

Generative artificial intelligence for small molecule drug design

De novo drug design using reinforcement learning with graph-based deep generative models

De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning

Optimizing Drug Design by Merging Generative AI With Active Learning Frameworks

FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough Reproduction

Evaluation of Reinforcement Learning in Transformer-based Molecular Design

Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation