Abstract:In silico methods are essential to the safety evaluation of chemicals. Computational risk assessment offers several approaches, with data science and knowledge-based methods becoming an increasingly important sub-group. One of the substantial attributes of data science is that it allows using existing data to find correlations, build strong hypotheses, and create new, valuable knowledge that may help to reduce the number of resource intensive experiments. In choosing a suitable method for toxicity prediction, the available data and desired toxicity endpoint are two essential factors to consider. The complexity of the endpoint can impact the success rate of the in silico models. For highly complex endpoints such as hepatotoxicity, it can be beneficial to decipher the toxic event from a more systemic point of view. We propose a data science-based modelling pipeline that uses compounds` connections to tissue-specific biological targets, interactome, and biological pathways as descriptors of compounds. Models trained on different combinations of the collected, compound-target, compound-interactor, and compound-pathway profiles, were used to predict the hepatotoxicity of drug-like compounds. Several tree-based models were trained, utilizing separate and combined target, interactome and pathway level variables. The model using combined descriptors of all levels and the random forest algorithm was further optimized. Descriptor importance for model performance was addressed and examined for a biological explanation to define which targets or pathways can have a crucial role in toxicity. Descriptors connected to cytochromes P450 enzymes, heme degradation and biological oxidation received high weights. Furthermore, the involvement of other, less discussed processes in connection with toxicity, such as the involvement of RHO GTPase effectors in hepatotoxicity, were marked as fundamental. The optimized combined model using only the selected descriptors yielded the best performance with an accuracy of 0.766. The same dataset using classical Morgan fingerprints for compound representation yielded models with similar performance measures, as well as the combination of systems biology-based descriptors and Morgan fingerprints. Consequently, adding the structural information of compounds did not enhance the predictive value of the models. The developed systems biology-based pipeline comprises a valuable tool in predicting toxicity, while providing novel insights about the possible mechanisms of the unwanted events.

Visualization strategies to aid interpretation of high-dimensional genotoxicity data

Interpretation of Toxicogenomics Data

Transforming environmental health datasets from the comparative toxicogenomics database into chord diagrams to visualize molecular mechanisms

Visualizing multidimensional cancer genomics data

Navigating the bridge between wet and dry lab toxicology research to address current challenges with high-dimensional data

High-content analysis/screening for predictive toxicology: application to hepatotoxicity and genotoxicity.

Toxicity profiling of engineered nanomaterials via multivariate dose-response surface modeling

DTox: A deep neural network-based in visio lens for large scale toxicogenomics data

From vision toward best practices: Evaluating in vitro transcriptomic points of departure for application in risk assessment using a uniform workflow

Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data

Toxicity prediction using target, interactome, and pathway profiles as descriptors

Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure.

Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology

Using ToxCast™ Data to Reconstruct Dynamic Cell State Trajectories and Estimate Toxicological Points of Departure.

Comprehensive data-driven analysis of the impact of chemoinformatic structure on the genome-wide biological response profiles of cancer cells to 1159 drugs

Visualizing dimensionality reduction of systems biology data

Visualization And Interpretation Of Multivariate Associations With Disease Risk Markers And Disease Risk-The Triplot

Predicting Organ Toxicity Using &Itin Vitro&It Bioactivity Data And Chemical Structure

Mining Toxicity Information from Large Amounts of Toxicity Data

Identification of Optimal Machine Learning Algorithms and Molecular Fingerprints for Explainable Toxicity Prediction Models Using ToxCast/Tox21 Bioassay Data

AI-driven Discovery of Morphomolecular Signatures in Toxicology