Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes

Yu-Wen Chen,Julia Hirschberg

2024-06-05

Abstract:Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets, indicating that format mismatch (i.e., discrepancies in word distribution) is not the main cause of performance decline on out-of-domain data. Lastly, a detailed analysis of SOAP notes is included to provide insights into missing information and hallucinations introduced by the models.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the robustness issue in the generation of doctor - patient conversation summaries, especially the performance on cross - datasets (i.e., out - of - domain data). Specifically, the research focuses on the following points: 1. **Cross - dataset performance evaluation**: Researchers explored the performance of the state - of - the - art generative summary models for doctor - patient conversations on out - of - domain data. This includes analyzing the limitations and advantages of these models under different configurations, namely the general model (without specifying subjective, objective, assessment, and plan notes) and the SOAP - oriented model (generating summaries that contain SOAP parts). 2. **Information omission and hallucination problems**: The research also specifically focuses on the types of information that the general model is prone to omit when generating summaries, especially objective information. In addition, the research also analyzes the hallucination problems that the SOAP - oriented model may generate when the input conversation does not contain specific - category information, that is, generating information that does not exist in the conversation. Through these studies, the author hopes to provide new insights to guide future research in developing robust doctor - patient conversation summary models suitable for practical scenarios.

Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes

Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques

Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations

Dr. Summarize: Global Summarization of Medical Dialogue by Exploiting Local Structures

Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models

Extrinsically-Focused Evaluation of Omissions in Medical Summarization

A Factual Aware Two-Stage Model for Medical Dialogue Summarization.

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations

MedicalSum: A Guided Clinical Abstractive Summarization Model for Generating Medical Reports from Patient-Doctor Conversations

Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations

Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models

Query-Guided Self-Supervised Summarization of Nursing Notes

Improving Clinical Note Generation from Complex Doctor-Patient Conversation

uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

Optimizing Automatic Summarization of Long Clinical Records Using Dynamic Context Extension:Testing and Evaluation of the NBCE Method

Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

Automatic analysis of medical dialogue in the home hemodialysis domain: Structure induction and summarization

Towards Efficient Medical Dialogue Summarization with Compacting-Abstractive Model.

Comparing Two Model Designs for Clinical Note Generation; Is an LLM a Useful Evaluator of Consistency?

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models