Evaluating approaches of training a generative large language model for multi-label classification of unstructured electronic health records

Dinithi Vithanage,Chao Deng,Lei Wang,Mengyang Yin,Mohammad Alkhalaf,Zhenyua Zhang,Yunshu Zhu,Alan Christy Soewargo,Ping Yu
DOI: https://doi.org/10.1101/2024.06.24.24309441
2024-10-23
Abstract:Multi-label classification of unstructured electronic health records (EHR) is challenging due to the semantic complexity of textual data. Identifying the most effective machine learning method for EHR classification is useful in real-world clinical settings. Advances in natural language processing (NLP) using large language models (LLMs) offer promising solutions. Therefore, this experimental research aims to test the effects of zero-shot and few-shot learning prompting, with and without parameter-efficient fine-tuning (PEFT) and retrieval-augmented generation (RAG) of LLMs, on the multi-label classification of unstructured EHR data from residential aged care facilities (RACFs) in Australia. The four clinical tasks examined are agitation in dementia, depression in dementia, frailty index, and malnutrition risk factors, using the Llama 3.1-8B. Performance evaluation includes accuracy, macro-averaged precision, recall, and F1 score, supported by non-parametric statistical analyses. Results indicate that both zero-shot and few-shot learning, regardless of the use of PEFT and RAG, demonstrate equivalent performance across the clinical tasks when using the same prompting template. Few-shot learning consistently outperforms zero-shot learning when neither PEFT nor RAG is applied. Notably, PEFT significantly enhances model performance in both zero-shot and few-shot learning; however, RAG improves performance only in few-shot learning. After PEFT, the performance of zero-shot learning is equal to that of few-shot learning across clinical tasks. Additionally, few-shot learning with RAG surpasses zero-shot learning with RAG, while no significant difference exists between few-shot learning with RAG and zero-shot learning with PEFT. These findings offer crucial insights into LLMs for researchers, practitioners, and stakeholders utilizing LLMs in clinical document analysis.
Health Informatics
What problem does this paper attempt to address?