Machine learning-assisted c-RASAR modeling of a curated set of orally active nephrotoxic drugs: Similarity-based predictions from close source neighbors

Kunal Roy,Arkaprava Banerjee

DOI: https://doi.org/10.26434/chemrxiv-2024-57klw

2024-08-22

Abstract:Cheminformatics and Machine Learning (ML) have seen exponential progress in the last decade, in the field of chemical risk assessment, due to their efficiency, accuracy, and reliability. The constant evolution of New Approach Methodologies (NAM) has inspired researchers around the globe to deviate from conventional approaches and adopt or develop new, “unconventional” methods. The classification Read-Across Structure-Activity Relationship (c-RASAR) is an unconventional approach that utilizes similarity and error-based information from the nearest neighboring compounds into a Machine Learning modeling framework, resulting in enhanced predictivity. Although this technique has so far been applied to molecular descriptors, we have applied this approach in the present study on molecular fingerprints along with conventional molecular descriptors for ML-based model development from a recently reported highly curated set of orally active nephrotoxic drugs. We initially developed ML models using nine different linear and non-linear algorithms separately on molecular descriptors and MACCS fingerprints, thus generating 18 different ML QSAR models. Using the chemical spaces defined by the modeling descriptors and fingerprints, the similarity and error-based RASAR descriptors were computed, and the most discriminating RASAR descriptors were used to develop another set of 18 different ML c-RASAR models. All 36 models were cross-validated 20 times with a 5-fold cross-validation strategy, and their predictivity was checked on the test set data. A multi-criteria decision-making strategy – the Sum of Ranking Differences (SRD) approach - was adopted to identify the best-performing model based on robustness and external validation parameters. This statistical analysis suggested that the c-RASAR models had an overall good performance, while the best-performing model was also a c-RASAR model. This model was used to screen a true external set data prepared from the known nephrotoxic compounds of DrugBankDB. These results also showed that our model efficiently identifies nephrotoxic compounds. The t-SNE analyses on the descriptors, fingerprints, and the RASAR descriptor spaces inferred that the RASAR descriptors efficiently encode the chemical information, as evident from the tight and distinct clustering of the data points. Additionally, the molecular descriptors and the corresponding RASAR descriptors were used to identify potential activity cliffs using the ARKA framework.

Chemistry

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to effectively predict the nephrotoxicity of oral drugs in the early stage of drug discovery, so as to avoid clinical trial failures and resource waste caused by nephrotoxicity. Specifically, the authors developed a prediction model capable of efficiently identifying nephrotoxic compounds by using the machine - learning - assisted classification read - across structure - activity relationship (c - RASAR) method based on a carefully curated dataset of orally active nephrotoxic drugs. Through this method, researchers hope to provide a fast, reliable and cost - effective means to assess the nephrotoxic potential of drugs, thereby screening out potentially harmful compounds in the early stage of drug development.

Machine learning-assisted c-RASAR modeling of a curated set of orally active nephrotoxic drugs: Similarity-based predictions from close source neighbors

From chemical similarity measures to an unconventional modeling framework: The application of c-RASAR along with dimensionality reduction techniques in a representative hepatotoxicity dataset

The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset

Molecular Similarity in Predictive Toxicology with a Focus on the q-RASAR Technique

Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-RA) to quantitative read-across structure–activity relationship (q-RASAR) with the application of machine learning

Quantitative Read-Across Structure-Activity Relationship (q-RASAR): A novel approach to estimate the subchronic oral safety (NOAEL) of diverse organic chemicals in rats

How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals?

Prediction Model of Clearance by a Novel Quantitative Structure–Activity Relationship Approach, Combination DeepSnap-Deep Learning and Conventional Machine Learning

Machine learning-based q-RASAR predictions of the bioconcentration factor of organic molecules estimated following the organisation for economic co-operation and development guideline 305

ARKA: A framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data

Accurate Clinical Toxicity Prediction using Multi-task Deep Neural Nets and Contrastive Molecular Explanations

Initial Development of Automated Machine Learning-Assisted Prediction Tools for Aryl Hydrocarbon Receptor Activators

An Explainable Supervised Machine Learning Model for Predicting Respiratory Toxicity of Chemicals Using Optimal Molecular Descriptors

Quantitative nanostructure-activity relationship modeling

Leveraging Machine Learning to Facilitate Individual Case Causality Assessment of Adverse Drug Reactions

Development of Quantitative Structure-Activity Relationship Models to Predict Potential Nephrotoxic Ingredients in Traditional Chinese Medicines

ToxTree: descriptor-based machine learning models for both hERG and Nav1.5 cardiotoxicity liability predictions

Validating ADME QSAR Models Using Marketed Drugs

Elucidating molecular mechanism and chemical space of chalcones through biological networks and machine learning approaches

Artificial Intelligence and Machine Learning Models for Predicting Drug-Induced Kidney Injury in Small Molecules

Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties