Optimizing Automatic Summarization of Long Clinical Records Using Dynamic Context Extension:Testing and Evaluation of the NBCE Method

Guoqing Zhang,Keita Fukuyama,Kazumasa Kishimoto,Tomohiro Kuroda
2024-11-13
Abstract:Summarizing patient clinical notes is vital for reducing documentation burdens. Current manual summarization makes medical staff struggle. We propose an automatic method using LLMs, but long inputs cause LLMs to lose context, reducing output quality especially in small size model. We used a 7B model, open-calm-7b, enhanced with Native Bayes Context Extend and a redesigned decoding mechanism to reference one sentence at a time, keeping inputs within context windows, 2048 tokens. Our improved model achieved near parity with Google's over 175B Gemini on ROUGE-L metrics with 200 samples, indicating strong performance using less resources, enhancing automated EMR summarization feasibility.
Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: **How to optimize the automatic summarization of long - length clinical records, so as to reduce the manual summarization burden on medical staff and improve the quality and efficiency of summarization**. Specifically, the paper focuses on the following aspects of problems: 1. **Inefficiency of manual summarization**: Currently, medical staff need to spend a great deal of time manually summarizing patients' clinical records. This not only increases the workload but may also lead to inaccurate summarization or omission of important information. 2. **Limitations of existing automatic summarization methods**: Although large - language models (LLMs) perform well in text summarization, when dealing with long - length clinical records, due to the limitation of the context window, the model is prone to losing context information, resulting in a decline in output quality. In particular, small - scale LLMs (such as models with a 7B parameter scale), their performance drops significantly when dealing with long - length inputs. 3. **Resource and cost issues**: Ultra - large - scale LLMs on cloud platforms (such as models with more than 175B parameters) can handle longer texts, but their deployment and use are costly, and there are data security and privacy risks. In addition, the hardware resources within hospitals are limited and it is difficult to support such large - scale models. In order to solve these problems, the paper proposes a method based on Dynamic Context Extension (DCE), combined with an improved decoding mechanism, using a smaller - scale LLM (such as the Open - Calm - 7B model with 7B parameters) to achieve efficient automatic summarization of clinical records. Through this method, the paper aims to achieve the following goals: - **Improve summarization quality**: By improving the context - handling mechanism, ensure that the model can still maintain a relatively high summarization quality when dealing with long - length clinical records. - **Reduce costs and resource consumption**: Use a smaller - scale model to reduce dependence on expensive hardware and cloud - computing resources and lower deployment and operation costs. - **Enhance data security and privacy protection**: By deploying the model locally, avoid uploading patient data to the cloud, thereby better protecting data security and privacy. - **Reduce communication latency**: Locally - deployed models can significantly reduce the latency caused by network communication and improve the speed and efficiency of clinical decision - making. Overall, the goal of this paper is to develop an automatic clinical - record - summarization system that is efficient, low - cost, secure and suitable for the actual medical environment.