QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool

Helle W. van den Maagdenberg,Martin Šícho,David Alencar Araripe,Sohvi Luukkonen,Linde Schoenmaker,Michiel Jespers,Olivier J. M. Béquignon,Marina Gorostiola González,Remco L. van den Broek,Andrius Bernatavicius,J. G. Coen van Hasselt,Piet. H. van der Graaf,Gerard J. P. van Westen

DOI: https://doi.org/10.26434/chemrxiv-2024-m9989

2024-03-05

Abstract:Building reliable and robust quantitative structure-property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of their models and facilitate their transferability into practice. In this work, we introduce QSPRpred, a toolkit for analysis of bioactivity data sets and QSPR modelling, which attempts to address the aforementioned challenges. QSPRpred's modular Python API enables users to intuitively describe different parts of a modelling workflow using a plethora of pre-implemented components, but also integrate customized implementations in a "plug-and-play" manner. QSPRpred data sets and models are directly serializable, which means they can be readily reproduced and put into operation after training as the models are saved with all required data pre-processing steps to make predictions on new compounds directly from SMILES strings. The general-purpose character of QSPRpred is also demonstrated by inclusion of support for multi-task and proteochemometric modelling. The package is extensively documented and comes with a large collection of tutorials to help new users. In this paper, we describe all of QSPRpred's functionalities and also conduct a small benchmarking case study to illustrate how different components can be leveraged to compare a diverse set of models. QSPRpred is fully open-source and available at https://github.com/CDDLeiden/QSPRpred. Scientific Contribution QSPRpred aims to provide a complex, but comprehensive Python API to conduct all tasks encountered in QSPR modelling from data preparation and analysis to model creation and model deployment. In contrast to similar packages, QSPRpred offers a wider and more exhaustive range of capabilities and integrations with many popular packages that also go beyond QSPR modelling. A significant contribution of QSPRpred is also in its automated and highly standardized serialization scheme, which significantly improves reproducibility and transferability of models.

Chemistry

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Complexity of data acquisition, analysis and organization**: When constructing a reliable quantitative structure - property relationship (QSPR) model, the acquisition, analysis and organization of experimental data are a challenge. QSPRpred provides the function of collecting data from different sources and supports multiple data formats, simplifying this process. 2. **Selection and evaluation of algorithms and methods**: With the continuous growth of available methods, it becomes difficult to evaluate different algorithms and methods. QSPRpred enables users to intuitively describe different parts of the modeling workflow and easily integrate custom implementations by providing a modular Python API and a large number of pre - implemented components. 3. **Reproducibility and practicality of the model**: The last challenge faced by researchers is to ensure the reproducibility of the model and promote its transfer to practice. The datasets and models of QSPRpred can be directly serialized, which means that they can be put into operation directly after training without repeating any preparatory steps, thus improving the reproducibility and practicality of the model. Specifically, QSPRpred aims to solve the following problems: - **Simplify the QSPR modeling process**: Simplify the entire QSPR modeling process by providing comprehensive tools from data preparation to model deployment. - **Improve the reproducibility and portability of the model**: Ensure that the model can be easily reproduced and applied to the prediction of new compounds through an automated serialization scheme. - **Support multi - task and protein chemometric modeling (PCM)**: Support not only traditional single - task QSPR models, but also multi - task and PCM modeling to handle more complex bioactivity datasets. - **Provide extensive benchmarking and comparison functions**: Allow users to systematically compare different algorithms, molecular representation methods and model development strategies, so as to select the best solution. Through these functions, QSPRpred strives to provide researchers with a powerful and flexible tool to help them perform QSPR modeling more efficiently in drug discovery and other fields.

QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool

Quantitative Structure–activity Relationship: Promising Advances in Drug Discovery Platforms

Predicting protein retention in ion‐exchange chromatography using an open source QSPR workflow

QSAR-Co: An Open Source Software for Developing Robust Multitasking or Multitarget Classification-Based QSAR Models

An automated framework for QSAR model building

QSARtuna: an automated QSAR modelling platform for molecular property prediction in drug design

From data to QSP models: a pipeline for using Boolean networks for hypothesis inference and dynamic model building

RRegrs: an R package for computer-aided model selection with multiple regression models

Regression Methods For Developing Qsar And Qspr Models To Predict Compounds Of Specific Pharmacodynamic, Pharmacokinetic And Toxicological Properties

How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals?

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking

MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models

Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

Generalizable, Fast, and Accurate DeepQSPR with fastprop

A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure–activity relationship modeling

Prediction reliability of QSAR models: an overview of various validation tools

Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit

Metis - A Python-Based User Interface to Collect Expert Feedback for Generative Chemistry Models

Improved Graph‐based Multitask Learning Model with Sparse Sharing for Quantitative Structure–property Relationship Prediction of Drug Molecules

PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning