Abstract:Background: In the United States, 1 in 5 adults currently serves as a family caregiver for an individual with a serious illness or disability. Unlike professional caregivers, family caregivers often assume this role without formal preparation or training. Thus, there is an urgent need to enhance the capacity of family caregivers to provide quality care. Leveraging technology as an educational tool or an adjunct to care is a promising approach that has the potential to enhance the learning and caregiving capabilities of family caregivers. Large language models (LLMs) can potentially be used as a foundation technology for supporting caregivers. An LLM can be categorized as a foundation model (FM), which is a large-scale model trained on a broad data set that can be adapted to a range of different domain tasks. Despite their potential, FMs have the critical weakness of "hallucination," where the models generate information that can be misleading or inaccurate. Information reliability is essential when language models are deployed as front-line help tools for caregivers. Objective: This study aimed to (1) develop a reliable caregiving language model (CaLM) by using FMs and a caregiving knowledge base, (2) develop an accessible CaLM using a small FM that requires fewer computing resources, and (3) evaluate the model's performance compared with a large FM. Methods: We developed a CaLM using the retrieval augmented generation (RAG) framework combined with FM fine-tuning for improving the quality of FM answers by grounding the model on a caregiving knowledge base. The key components of the CaLM are the caregiving knowledge base, a fine-tuned FM, and a retriever module. We used 2 small FMs as candidates for the foundation of the CaLM (LLaMA [large language model Meta AI] 2 and Falcon with 7 billion parameters) and adopted a large FM (GPT-3.5 with an estimated 175 billion parameters) as a benchmark. We developed the caregiving knowledge base by gathering various types of documents from the internet. We focused on caregivers of individuals with Alzheimer disease and related dementias. We evaluated the models' performances using the benchmark metrics commonly used in evaluating language models and their reliability for providing accurate references with their answers. Results: The RAG framework improved the performance of all FMs used in this study across all measures. As expected, the large FM performed better than the small FMs across all metrics. Interestingly, the small fine-tuned FMs with RAG performed significantly better than GPT 3.5 across all metrics. The fine-tuned LLaMA 2 with a small FM performed better than GPT 3.5 (even with RAG) in returning references with the answers. Conclusions: The study shows that a reliable and accessible CaLM can be developed using small FMs with a knowledge base specific to the caregiving domain.

Fine-tuning large language models for effective nutrition support in residential aged care: a domain expertise approach

Enhancing Early Detection of Cognitive Decline in the Elderly: A Comparative Study Utilizing Large Language Models in Clinical Notes

Evaluating approaches of training a generative large language model for multi-label classification of unstructured electronic health records

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Extracting Lifestyle Factors for Alzheimer's Disease from Clinical Notes Using Deep Learning with Weak Supervision

Extracting Symptoms of Agitation in Dementia from Free-Text Nursing Notes Using Advanced Natural Language Processing

Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records

Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes

Use of a large language model with instruction-tuning for reliable clinical frailty scoring

Fine-Tuning Large Language Models to Enhance Programmatic Assessment in Graduate Medical Education

Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation

A Reliable and Accessible Caregiving Language Model (CaLM) to Support Tools for Caregivers: Development and Evaluation Study

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods

Dementia risk prediction using decision-focused content selection from medical notes

Fine-Tuning LLMs for Reliable Medical Question-Answering Services

Advancing entity recognition in biomedicine via instruction tuning of large language models

A Comparative Study of Pretrained Language Models for Long Clinical Text

Enhancing Medical Specialty Assignment to Patients using NLP Techniques

Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare

Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes