UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels

Jake Silberg,Kyle Swanson,Elana Simon,Angela Zhang,Zaniar Ghazizadeh,Scott Ogden,Hisham Hamadeh,James Zou

DOI: https://doi.org/10.1101/2024.06.21.24309315

2024-06-22

Abstract:Drug-induced toxicity is one of the leading reasons new drugs fail clinical trials. Machine learning models that predict drug toxicity from molecular structure could help researchers prioritize less toxic drug candidates. However, current toxicity datasets are typically small and limited to a single organ system (e.g., cardio, renal, or liver). Creating these datasets often involved time-intensive expert curation by parsing drug label documents that can exceed 100 pages per drug. Here, we introduce UniTox, a unified dataset of 2,418 FDA–approved drugs with drug–induced toxicity summaries and ratings created by using GPT–4o to process FDA drug labels. UniTox spans eight types of toxicity: cardiotoxicity, liver toxicity, renal toxicity, pulmonary toxicity, hematological toxicity, dermatological toxicity, ototoxicity, and infertility. This is, to the best of our knowledge, the largest such systematic human in vivo database by number of drugs and toxicities, and the first covering nearly all FDA–approved medications for several of these toxicities. We recruited clinicians to validate a random sample of our GPT–4o annotated toxicities, and UniTox toxicity ratings concord with clinician labelers 87–96% of the time. Finally, we benchmark a graph neural network trained on UniTox to demonstrate the utility of this dataset for building molecular toxicity prediction models.

Health Informatics

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the toxicity problem caused by drugs, especially how to predict drug toxicity from molecular structures through machine - learning models to help researchers prioritize drug candidates with lower toxicity. However, current toxicity data sets are usually small in scale and limited to a single organ system (such as the heart, kidney or liver). Creating these data sets often requires a great deal of time, with experts manually parsing label documents of more than 100 pages for each drug. For this reason, the authors introduced **UniTox**, a unified data set that contains 2,418 FDA - approved drugs and their drug - induced toxicity summaries and ratings. These data were generated by processing FDA drug labels using GPT - 4. UniTox covers eight types of toxicity: cardiotoxicity, hepatotoxicity, nephrotoxicity, pulmonary toxicity, hematotoxicity, cutaneous toxicity, ototoxicity and infertility. ### Main contributions: 1. **Constructing the UniTox data set**: By using large - language models (LLMs) to quickly classify drug toxicity in FDA labels, a large cross - toxicity data set in humans containing 2,418 FDA - approved drugs was constructed. 2. **Verifying accuracy**: Compared with existing data sets, UniTox has achieved a significant improvement in accuracy, with a compliance rate of up to 87 - 96% with human - annotated data. 3. **Clinical verification**: Clinical doctors were recruited to verify random samples, further confirming the reliability of UniTox. 4. **Model performance evaluation**: Graph neural networks (GNNs) were trained using UniTox, demonstrating the practicality of this data set in constructing molecular toxicity prediction models. ### Method overview: 1. **Data collection and pre - processing**: 2,418 drugs and their labels were screened from the FDALabel database, and drugs with local, lavage and intradermal administration routes were removed. 2. **Generating toxicity ratings**: Using GPT - 4 and the chain - of - thought method, toxicity ratings were generated through a two - layer prompt system. The first - layer prompt requires the model to summarize information about specific types of toxicity in the drug label, and the second - layer prompt requires the model to generate ternary (none / less / most) or binary (none / yes) toxicity ratings based on these summaries. 3. **External data set verification**: Verification and comparison were carried out with three FDA - designed data sets, DICTrank, DILIrank and DIRIL, to evaluate the accuracy of UniTox. 4. **Clinical doctor verification**: For five types of toxicity without existing verification data, clinical doctors were invited to conduct manual verification on 100 randomly sampled drugs. ### Results: 1. **UniTox data set**: It contains eight toxicity types of 2,418 drugs. Each drug has a toxicity summary generated by GPT - 4, ternary and binary toxicity ratings, and the SPL ID used to generate the data. 2. **Verification results**: The comparison and verification with DICTrank, DILIrank and DIRIL show that UniTox is significantly superior to existing methods in accuracy, especially in high - confidence prediction. 3. **Clinical doctor verification**: 87 - 96% of the drugs were considered to be accurately rated by clinical doctors, revealing some marginal cases and potential improvement directions for the model. 4. **GNN model performance**: The Chemprop - RDKit model trained with UniTox performs well in a multi - task setting and can achieve relatively high ROC - AUC values for different toxicity types. ### Conclusion: UniTox is a large - scale, multi - toxicity data set. Through the use of large - language models and verification by clinical doctors, its practicality and reliability in drug toxicity prediction have been proven. This provides a valuable resource for future drug research and development and helps to improve the success rate and safety of clinical trials.

UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels

The Liver Toxicity Knowledge Base (LKTB) and Drug-Induced Liver Injury (DILI) Classification for Assessment of Human Liver Injury

A deep learning based multi-model approach for predicting drug-like chemical compound's toxicity

Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology

Drug Toxicity Prediction by Machine Learning Approaches

MolToxPred: small molecule toxicity prediction using machine learning approach

Multitask CapsNet: an Imbalanced Data Deep Learning Method for Predicting Toxicants

Semi-Supervised Learning to Boost Cardiotoxicity Prediction by Mining a Large Unlabeled Small Molecule Dataset

Completion of the DrugMatrix Toxicogenomics Database using ToxCompl

AI-driven Discovery of Morphomolecular Signatures in Toxicology

TOP: Towards Better Toxicity Prediction by Deep Molecular Representation Learning

Review of machine learning and deep learning models for toxicity prediction

Exploring the Hepatotoxicity of Drugs through Machine Learning and Network Toxicological Methods

In silico prediction of potential drug-induced nephrotoxicity with machine learning methods

Predictive Models for Human Organ Toxicity Based on in Vitro Bioactivity Data and Chemical Structure

DTox: A deep neural network-based in visio lens for large scale toxicogenomics data

An entropy weight method to integrate big omics and mechanistically evaluate DILI

Machine Learning Prediction of On/Off Target-driven Clinical Adverse Events

The application of natural language processing for the extraction of mechanistic information in toxicology

Comparing Machine Learning Algorithms for Predicting Drug-Induced Liver Injury (DILI).

Deep learning for predicting toxicity of chemicals: a mini review