Language Interaction Network for Clinical Trial Approval Estimation

Chufan Gao,Tianfan Fu,Jimeng Sun

2024-04-26

Abstract:Clinical trial outcome prediction seeks to estimate the likelihood that a clinical trial will successfully reach its intended endpoint. This process predominantly involves the development of machine learning models that utilize a variety of data sources such as descriptions of the clinical trials, characteristics of the drug molecules, and specific disease conditions being targeted. Accurate predictions of trial outcomes are crucial for optimizing trial planning and prioritizing investments in a drug portfolio. While previous research has largely concentrated on small-molecule drugs, there is a growing need to focus on biologics-a rapidly expanding category of therapeutic agents that often lack the well-defined molecular properties associated with traditional drugs. Additionally, applying conventional methods like graph neural networks to biologics data proves challenging due to their complex nature. To address these challenges, we introduce the Language Interaction Network (LINT), a novel approach that predicts trial outcomes using only the free-text descriptions of the trials. We have rigorously tested the effectiveness of LINT across three phases of clinical trials, where it achieved ROC-AUC scores of 0.770, 0.740, and 0.748 for phases I, II, and III, respectively, specifically concerning trials involving biologic interventions.

Biomolecules,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The main focus of this paper is the prediction of the success rate of clinical trials, which is a crucial aspect in drug development and helps optimize trial planning and investment decision-making. Traditional machine learning models usually use drug descriptions, molecular properties, and disease conditions to predict trial results, but they face difficulties in handling complex data such as biologics. Biologics are a rapidly growing treatment method, with less clear molecular properties compared to traditional small molecule drugs. The paper proposes a new approach called "Language Interaction Network" (LINT), which predicts trial results solely based on the free-text description of clinical trials. LINT utilizes pre-trained language models such as BERT and combines drug information with relevant medical codes (ICD codes) to predict the results of Phase I, II, and III clinical trials. In clinical trials involving biologics, LINT achieves ROC-AUC scores of 0.770, 0.740, and 0.748 in different stages, demonstrating better performance than traditional models. Furthermore, LINT is interpretable and can explain model decisions by using Shapley values, highlighting the most important parts that influence the prediction of input text. Compared to previous work, LINT uses a larger dataset, including small molecule drugs and biologics, and is capable of handling complex text and tabular data. The paper also discusses existing challenges such as limited training data and diverse trial types, and notes that LINT can serve as an open-source framework adaptable to new pre-trained language models. Future research directions may include unsupervised learning strategies to expand annotated datasets, improve label quality, and create more interpretable models to optimize clinical trial design and increase success rates.

Language Interaction Network for Clinical Trial Approval Estimation

HINT: Hierarchical Interaction Network for Trial Outcome Prediction Leveraging Web Data

TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

Attention-Based LSTM Network for COVID-19 Clinical Trial Parsing

Can artificial intelligence predict clinical trial outcomes?

A Survey of Artificial Intelligence Methods for Clinical Trial Outcome Prediction

Multimodal Clinical Trial Outcome Prediction with Large Language Models

Uncertainty Quantification and Interpretability for Clinical Trial Approval Prediction

Prediction of clinical trials outcomes based on target choice and clinical trial design with multi‐modal artificial intelligence

CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

BOIN Suite: A Software Platform to Design and Implement Novel Early-Phase Clinical Trials

SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with Meta-Learning

[Physiologic degeneration and restoration of neurosecretory cells of the nucleus praeopticus in carp and in Cyprinus carpio].

TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of Clinical Trials

Machine Learning Prediction of Clinical Trial Operational Efficiency

[Usefulness of the Michigan Alcoholism Screening Test (MAST) in Poland].

Uncertainty Quantification on Clinical Trial Outcome Prediction

Matching Patients to Clinical Trials with Large Language Models

Predicting Clinical Trial Results by Implicit Evidence Integration

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

Retrieval-augmented large language models for clinical trial screening.