Abstract:Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR), and mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidate significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.

Identification of Optimal Machine Learning Algorithms and Molecular Fingerprints for Explainable Toxicity Prediction Models Using ToxCast/Tox21 Bioassay Data

Predicting and investigating cytotoxicity of nanoparticles by translucent machine learning

Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology

Explaining Chemical Toxicity using Missing Features

An Explainable Supervised Machine Learning Model for Predicting Respiratory Toxicity of Chemicals Using Optimal Molecular Descriptors

In silico prediction of ocular toxicity of compounds using explainable machine learning and deep learning approaches

Accurate Clinical Toxicity Prediction using Multi-task Deep Neural Nets and Contrastive Molecular Explanations

Expression of Bombyx family fungal protease inhibitor F from Bombyx mori by baculovirus vector.

Predictive Models for Human Organ Toxicity Based on in Vitro Bioactivity Data and Chemical Structure

Explainable AI and tree-based ensemble models: a comparative study in predicting chemical pulmonary toxicity

Molecular Fingerprints Optimization for Enhanced Predictive Modeling

In Silico Prediction of Chemical Acute Dermal Toxicity Using Explainable Machine Learning Methods

MolToxPred: small molecule toxicity prediction using machine learning approach

Review of machine learning and deep learning models for toxicity prediction

In silico prediction of drug-induced developmental toxicity by using machine learning approaches

XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity

NeuTox: A weighted ensemble model for screening potential neuronal cytotoxicity of chemicals based on various types of molecular representations

Drug Toxicity Prediction by Machine Learning Approaches

Ensemble multiclassification model for predicting developmental toxicity in zebrafish

In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods

Hybrid non-animal modeling: A mechanistic approach to predict chemical hepatotoxicity