Abstract:Different types of chemicals and products may exhibit various health risks when administered into the human body. For toxicity reasons, the number of new drugs entering the market through the conventional drug development process has been reduced over the years. However, with the advent of big data and artificial intelligence, machine learning techniques have emerged as a potential solution for predicting toxicity and ensuring efficient drug development and chemical safety. An ML model for toxicity prediction can reduce experimental costs and time while addressing ethical concerns by drastically reducing the need for animals and clinical trials. Herein, MolToxPred, an ML-based tool, has been developed using a stacked model approach to predict the potential toxicity of small molecules and metabolites. The stacked model consists of random forest, multi-layer perceptron, and LightGBM as base classifiers and Logistic Regression as the meta classifier. For training and validation purposes, a comprehensive set of toxic and non-toxic molecules is curated. Different structural and physicochemical-based features in the form of molecular descriptors and fingerprints were employed. MolToxPred utilizes a comprehensive feature selection process and optimizes its hyperparameters through Bayesian optimization with stratified 5-fold cross-validation. In the evaluation phase, MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on an external validation set. The McNemar test was used as the post-hoc test to determine if the stacked models' performance was significantly different compared to the base learners. The developed stacked model outperformed its base classifiers and an existing tool in the literature, reaffirming its better performance. The hypothesis is that the incorporation of a diverse set of data, the subsequent feature selection, and a stacked ensemble approach give MolToxPred the edge over other methods. In addition to this, an attempt has been made to identify structural alerts responsible for endpoints of the Tox21 data to determine the association of a molecule with a plausible downstream pathway of action. MolToxPred may be helpful for drug discovery and regulatory pipelines in pharmaceutical and other industries for in silico toxicity prediction of small molecule candidates.

Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network

MolToxPred: small molecule toxicity prediction using machine learning approach

MutagenPred-GCNNs: A Graph Convolutional Neural Network-Based Classification Model for Mutagenicity Prediction with Data-Driven Molecular Fingerprints

Deep active learning with high structural discriminability for molecular mutagenicity prediction

ToxSTK: A Multi-Target Toxicity Assessment Utilizing Molecular Structure and Stacking Ensemble Learning

Predicting protein thermal stability changes upon single and multi-point mutations via restricted attention subgraph neural network

Prediction of molecular-specific mutagenic alerts and related mechanisms of chemicals by a convolutional neural network (CNN) model based on SMILES split

In Silico Prediction of Chemical Ames Mutagenicity

In Silico Prediction of Chemical Genotoxicity Using Machine Learning Methods and Structural Alerts.

GeoScatt-GNN: A Geometric Scattering Transform-Based Graph Neural Network Model for Ames Mutagenicity Prediction

AMPred-CNN: Ames mutagenicity prediction model based on convolutional neural networks

Asking the right questions for mutagenicity prediction from BioMedical text

Accurate Clinical Toxicity Prediction using Multi-task Deep Neural Nets and Contrastive Molecular Explanations

Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry

Amesformer: a graph transformer neural network for mutagenicity prediction

Development of a robust Machine learning model for Ames test outcome prediction

The enhancement scheme for the predictive ability of QSAR: A case of mutagenicity

A deep learning based multi-model approach for predicting drug-like chemical compound's toxicity

Investigation of model stacking for drug sensitivity prediction

Amesformer: State-of-the-Art Mutagenicity Prediction with Graph Transformers

StackACPred: Prediction of Anticancer Peptides by Integrating Optimized Multiple Feature Descriptors with Stacked Ensemble Approach