Albert Bou,Morgan Thomas,Sebastian Dittert,Carles Navarro Ramírez,Maciej Majewski,Ye Wang,Shivam Patel,Gary Tresadern,Mazen Ahmad,Vincent Moens,Woody Sherman,Simone Sciabola,Gianni De Fabritiis
Abstract:In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{<a class="link-external link-https" href="https://github.com/acellera/acegen-open" rel="external noopener nofollow">this https URL</a>} and available for use under the MIT license.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to address the challenges of molecule generation and optimization in drug design. Specifically, it attempts to improve the reinforcement - learning (RL) - based molecule - generation methods by introducing a new toolkit named ACEGEN. The following are the main problems that the paper tries to solve:
1. **Balance between complexity and efficiency**:
- Existing RL algorithms often face the challenge of balancing flexibility, reliability, and efficiency when applied to drug design. Due to the complexity of advanced RL algorithms and high dependence on specialized code, achieving efficient and reliable molecule generation and optimization becomes very challenging.
2. **Effective exploration of chemical space**:
- Drug design requires finding molecules with optimal properties from a vast chemical space. Traditional methods have difficulty effectively searching this huge space because of its large scale and inability to be simply enumerated. RL provides a potential solution, but how to efficiently explore the chemical space remains a key issue.
3. **Design of customized reward functions**:
- In drug design, the goals are often multifaceted (such as potency, selectivity, bioavailability, and toxicity), while existing scoring functions can usually only approximately define these goals. Designing a reward function that can reflect real - world requirements and guide RL algorithms is an important challenge.
4. **Limitations of existing RL implementations**:
- Current RL implementations for drug discovery are highly dependent on custom - made code, which leads to redundancy, complexity, and inefficiency, limiting the integration of diverse solutions. The paper proposes to use pre - tested and reusable components in the TorchRL library to build more efficient drug - design agents.
### ACEGEN's solutions
To address the above challenges, the paper proposes ACEGEN, a comprehensive and simplified toolkit built on TorchRL, specifically for generative drug design. The main features of ACEGEN include:
- **Modularity and reusability**: Utilize the well - tested and reusable components provided by TorchRL to make development more flexible and efficient.
- **Support for multiple RL algorithms**: Implement multiple RL algorithms (such as REINFORCE, A2C, PPO, etc.), and demonstrate their performance through benchmark tests.
- **Diversity - generation modes**: Support multiple generation modes such as de novo generation, scaffold modification, and fragment ligation to meet different drug - design requirements.
- **Chemical - language models**: Use chemical - language models (CLMs) for molecule generation and provide multiple string representations (such as SMILES, DeepSMILES, SELFIES, etc.).
- **Scoring and evaluation functions**: Allow users to define custom - made scoring functions and integrate a wide range of scoring functions and diversity filters in the MolScore library.
Through these improvements, ACEGEN aims to enhance the application effect of RL in drug design, especially in terms of sample efficiency, maximum performance, and exploration ability.