MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Morgan Thomas,Noel M. O’Boyle,Andreas Bender,Chris De Graaf
DOI: https://doi.org/10.1186/s13321-024-00861-w
2024-06-02
Journal of Cheminformatics
Abstract:Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT 2a ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
The problems that this paper attempts to solve are some of the current challenges in the evaluation and application of generative models in de novo drug design. Specifically: 1. **Lack of consideration for generated chemical types**: Many existing generative models do not fully consider whether the types of chemicals generated are suitable for the needs of drug development when designing new molecules. 2. **Insufficient relevance of targets**: Many models are still applied to targets that have little relation to actual drug discovery, such as rediscovering specific molecules or optimizing the penalized logP value. 3. **Neglect of novelty**: For the novelty of proposed de novo - designed molecules, the scientific significance is often overlooked. 4. **Lack of standardized evaluation**: Although simple and easy - to - implement targets are useful when prospective validation of all models is not possible, there is still a need for standardized evaluation methods that can reflect the challenges of real - world drug discovery. To address these challenges, the paper proposes MolScore, which is a framework for scoring, evaluating, and benchmarking generative models in de novo drug design. The main contributions of MolScore are: - **Unifying existing benchmarks**: MolScore re - implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt) and allows for the easy creation of new benchmark tasks. - **Providing flexible scoring functions**: MolScore includes many scoring functions related to drug design, such as molecular similarity, molecular docking, prediction models, synthesizability, etc., and provides performance metrics to evaluate the performance of generative models based on the generated chemicals. - **Easy integration and use**: MolScore can be easily integrated into existing Python scripts, and integration can be completed with just three lines of code. - **Supporting the design of multi - parameter targets**: MolScore not only supports the optimization of a single target but also supports the design of multi - parameter targets, which is very important in actual drug design. Through these functions, MolScore aims to provide a unified tool for practitioners and researchers in the drug design field to promote the application and evaluation of generative models.