Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Yuhao Chen,Zhimu Wang,Bo Wen,Farhana Zulkernine
2024-05-30
Abstract:Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper aims to address the performance issue of evaluating and comparing open-source large-scale language models (LLMs) in medical text summarization tasks. Traditional similarity evaluation metrics such as ROUGE and BERTSCORE cannot accurately measure the alignment between human intent and generated responses. The researchers propose an evaluation method using GPT-4 as an evaluator to analyze the performance of open-source LLMs like Llama2 and Mistral in medical summarization tasks. By designing uniform prompts, the two models are compared in three tasks: consumer health question summarization, biomedical query-based summarization, and dialogue summarization, using multiple publicly available datasets for testing. GPT-4 is used to determine which LLM generates more effective summaries and provide explanations. This approach helps in selecting the most suitable LLM for specific tasks and promotes knowledge discovery in the field of digital health.