Reinvent 4: Modern AI–driven generative molecule design

Hannes H. Loeffler,Jiazhen He,Alessandro Tibo,Jon Paul Janet,Alexey Voronov,Lewis H. Mervin,Ola Engkvist
DOI: https://doi.org/10.1186/s13321-024-00812-5
2024-02-23
Journal of Cheminformatics
Abstract:REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution . The software provides an open–source reference implementation for generative molecular design where the software is also being used in production to support in–house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop an advanced, open - source generative artificial intelligence framework (REINVENT 4) for designing small molecules with specific properties. Specifically, this framework aims to drive molecule generation through deep - learning models (such as recurrent neural networks and Transformer architectures) and seamlessly integrate these generators into general machine - learning optimization algorithms, including transfer learning, reinforcement learning, and curriculum learning. ### Main Problems and Solutions 1. **De novo design**: - Generate completely new molecular structures without relying on existing molecular information. - Use techniques such as generative adversarial networks (GANs) and variational auto - encoders (VAEs) to explore unknown chemical spaces. 2. **R - group replacement**: - Replace specific parts in a molecule to optimize its properties. - By adjusting certain segments of the molecule, look for better drug candidates. 3. **Library design**: - Design a compound library containing multiple molecules to accelerate the drug - screening process. - Use generative models to quickly generate a large number of potential drug molecules. 4. **Linker design**: - Design linkers for connecting different molecular fragments to construct more complex molecular structures. - Ensure that the linker design meets the requirements of the physicochemical properties of drug molecules. 5. **Scaffold hopping**: - Look for new molecules that have similar biological activities to existing molecules but different structures. - By transforming the molecular scaffold, explore new chemical spaces. 6. **Molecule optimization**: - Optimize various properties of molecules, such as pharmacokinetics (PK/PD), toxicology, synthetic feasibility, etc. - Use methods such as reinforcement learning to iteratively improve the molecular structure to make it more in line with the requirements of drug development. ### Key Technologies and Methods - **Generative models**: Use sequence models based on SMILES strings, such as RNN and Transformer, to generate molecules. - **Reinforcement Learning (RL)**: Guide the model to generate molecules with specific properties through a reward mechanism. - **Transfer Learning (TL)**: Utilize the knowledge of pre - trained models to quickly adapt to new tasks or datasets. - **Curriculum Learning (CL)**: Gradually increase the task difficulty to help the model better learn complex tasks. ### Goals The goal of REINVENT 4 is to provide a reference implementation that covers the most commonly used generative molecular - design algorithms and promotes education, innovation, and cooperation of AI in the field of molecular design. In addition, this framework also supports command - line tools, reads user configuration files (in TOML or JSON format), and provides detailed documentation and sample code for the convenience of researchers and developers. Through these functions and technologies, REINVENT 4 aims to accelerate the drug - discovery process and improve the efficiency and success rate of molecular design.