Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes

Yu-Wen Chen,Julia Hirschberg
2024-06-05
Abstract:Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets, indicating that format mismatch (i.e., discrepancies in word distribution) is not the main cause of performance decline on out-of-domain data. Lastly, a detailed analysis of SOAP notes is included to provide insights into missing information and hallucinations introduced by the models.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the robustness issue in the generation of doctor - patient conversation summaries, especially the performance on cross - datasets (i.e., out - of - domain data). Specifically, the research focuses on the following points: 1. **Cross - dataset performance evaluation**: Researchers explored the performance of the state - of - the - art generative summary models for doctor - patient conversations on out - of - domain data. This includes analyzing the limitations and advantages of these models under different configurations, namely the general model (without specifying subjective, objective, assessment, and plan notes) and the SOAP - oriented model (generating summaries that contain SOAP parts). 2. **Information omission and hallucination problems**: The research also specifically focuses on the types of information that the general model is prone to omit when generating summaries, especially objective information. In addition, the research also analyzes the hallucination problems that the SOAP - oriented model may generate when the input conversation does not contain specific - category information, that is, generating information that does not exist in the conversation. Through these studies, the author hopes to provide new insights to guide future research in developing robust doctor - patient conversation summary models suitable for practical scenarios.