CliniDigest: A Case Study in Large Language Model Based Large-Scale Summarization of Clinical Trial Descriptions

Renee D. White,Tristan Peng,Pann Sripitak,Alexander Rosenberg Johansen,Michael Snyder
DOI: https://doi.org/10.1145/3582515.3609559
2023-08-01
Abstract:A clinical trial is a study that evaluates new biomedical interventions. To design new trials, researchers draw inspiration from those current and completed. In 2022, there were on average more than 100 clinical trials submitted to <a class="link-external link-http" href="http://ClinicalTrials.gov" rel="external noopener nofollow">this http URL</a> every day, with each trial having a mean of approximately 1500 words [1]. This makes it nearly impossible to keep up to date. To mitigate this issue, we have created a batch clinical trial summarizer called CliniDigest using GPT-3.5. CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials. CliniDigest can reduce up to 85 clinical trial descriptions (approximately 10,500 words) into a concise 200-word summary with references and limited hallucinations. We have tested CliniDigest on its ability to summarize 457 trials divided across 27 medical subdomains. For each field, CliniDigest generates summaries of $\mu=153,\ \sigma=69 $ words, each of which utilizes $\mu=54\%,\ \sigma=30\% $ of the sources. A more comprehensive evaluation is planned and outlined in this paper.
Computation and Language
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper attempts to address the issue of the vast amount of clinical trial information that is difficult to track. Specifically: 1. **Surge in the Number of Clinical Trials**: In 2022, an average of over 100 clinical trials were submitted to ClinicalTrials.gov every day, with each trial averaging about 1500 words. This makes it challenging for researchers to keep up with the latest clinical trial developments in a timely manner. 2. **Need for Information Summarization**: To design new clinical trials, researchers need to draw inspiration from existing and completed trials. However, due to the large number of trials, manually reading and summarizing this information becomes very difficult. 3. **Need for Real-time, Accurate, and Comprehensive Summaries**: Researchers need a tool that can generate real-time, accurate, and comprehensive summaries of clinical trials to quickly understand the latest developments in relevant fields. To this end, the authors developed a tool called CliniDigest, which uses GPT-3.5 to generate large-scale clinical trial summaries. CliniDigest can compress up to 85 clinical trials (approximately 10,500 words) into a 200-word summary, with citations and limited hallucinations. The tool has been tested on 457 trials across 27 medical subfields, with each summary averaging 153 words and citing 54% of the source material.