MolToxPred: small molecule toxicity prediction using machine learning approach

Anjali Setiya,Vinod Jani,Uddhavesh Sonavane,Rajendra Joshi

DOI: https://doi.org/10.1039/d3ra07322j

IF: 4.036

2024-01-31

RSC Advances

Abstract:Different types of chemicals and products may exhibit various health risks when administered into the human body. For toxicity reasons, the number of new drugs entering the market through the conventional drug development process has been reduced over the years. However, with the advent of big data and artificial intelligence, machine learning techniques have emerged as a potential solution for predicting toxicity and ensuring efficient drug development and chemical safety. An ML model for toxicity prediction can reduce experimental costs and time while addressing ethical concerns by drastically reducing the need for animals and clinical trials. Herein, MolToxPred, an ML-based tool, has been developed using a stacked model approach to predict the potential toxicity of small molecules and metabolites. The stacked model consists of random forest, multi-layer perceptron, and LightGBM as base classifiers and Logistic Regression as the meta classifier. For training and validation purposes, a comprehensive set of toxic and non-toxic molecules is curated. Different structural and physicochemical-based features in the form of molecular descriptors and fingerprints were employed. MolToxPred utilizes a comprehensive feature selection process and optimizes its hyperparameters through Bayesian optimization with stratified 5-fold cross-validation. In the evaluation phase, MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on an external validation set. The McNemar test was used as the post-hoc test to determine if the stacked models' performance was significantly different compared to the base learners. The developed stacked model outperformed its base classifiers and an existing tool in the literature, reaffirming its better performance. The hypothesis is that the incorporation of a diverse set of data, the subsequent feature selection, and a stacked ensemble approach give MolToxPred the edge over other methods. In addition to this, an attempt has been made to identify structural alerts responsible for endpoints of the Tox21 data to determine the association of a molecule with a plausible downstream pathway of action. MolToxPred may be helpful for drug discovery and regulatory pipelines in pharmaceutical and other industries for in silico toxicity prediction of small molecule candidates.

chemistry, multidisciplinary

What problem does this paper attempt to address?

The paper aims to address the issue of predicting the toxicity of chemical substances. Specifically, the researchers have developed a machine learning tool named MolToxPred to predict the potential toxicity of small molecules and metabolites. In traditional drug development processes, toxicological screening is time-consuming, costly, and inefficient, leading to a decrease in the number of new drugs entering the market. Additionally, animal experiments are controversial in terms of ethics and accuracy. Therefore, MolToxPred leverages machine learning technology to improve the accuracy and efficiency of toxicity prediction, reduce experimental costs and time, and minimize reliance on animal testing. MolToxPred employs a stacking model approach, combining Random Forest, Multilayer Perceptron, and LightGBM as base classifiers, with Logistic Regression as the meta-classifier. Through a comprehensive feature selection process and Bayesian optimization for hyperparameter tuning, it has been validated on multiple datasets. The study results show that MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on the external validation set, outperforming existing tools and base classifiers. In summary, the main goal of this paper is to develop an efficient and accurate machine learning tool for predicting the toxicity of small molecule candidates, thereby assisting drug discovery and regulatory processes in the pharmaceutical and other industries.

MolToxPred: small molecule toxicity prediction using machine learning approach

A deep learning based multi-model approach for predicting drug-like chemical compound's toxicity

Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology

Accurate Clinical Toxicity Prediction using Multi-task Deep Neural Nets and Contrastive Molecular Explanations

ToxSTK: A Multi-Target Toxicity Assessment Utilizing Molecular Structure and Stacking Ensemble Learning

ToxinPred 3.0: An improved method for predicting the toxicity of peptides

Deep active learning with high structural discriminability for molecular mutagenicity prediction

Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network

ProTox-II: a webserver for the prediction of toxicity of chemicals

XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity

In silico prediction of drug-induced developmental toxicity by using machine learning approaches

Identification of Optimal Machine Learning Algorithms and Molecular Fingerprints for Explainable Toxicity Prediction Models Using ToxCast/Tox21 Bioassay Data

Expression of Bombyx family fungal protease inhibitor F from Bombyx mori by baculovirus vector.

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

Drug Toxicity Prediction by Machine Learning Approaches

UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels

Review of machine learning and deep learning models for toxicity prediction

ProTox 3.0: a webserver for the prediction of toxicity of chemicals

AOP-Based Machine Learning for Toxicity Prediction

Toxicity Detection in Drug Candidates using Simplified Molecular-Input Line-Entry System

Advancing Computational Toxicology by Interpretable Machine Learning