Abstract:\textbf{Objective:} We aimed to develop an advanced multi-task large language model (LLM) framework to extract multiple types of information about dietary supplements (DS) from clinical records. \textbf{Methods:} We used four core DS information extraction tasks - namely, named entity recognition (NER: 2,949 clinical sentences), relation extraction (RE: 4,892 sentences), triple extraction (TE: 2,949 sentences), and usage classification (UC: 2,460 sentences) as our multitasks. We introduced a novel Retrieval-Augmented Multi-task Information Extraction (RAMIE) Framework, including: 1) employed instruction fine-tuning techniques with task-specific prompts, 2) trained LLMs for multiple tasks with improved storage efficiency and lower training costs, and 3) incorporated retrieval augmentation generation (RAG) techniques by retrieving similar examples from the training set. We compared RAMIE's performance to LLMs with instruction fine-tuning alone and conducted an ablation study to assess the contributions of multi-task learning and RAG to improved multitasking performance. \textbf{Results:} With the aid of the RAMIE framework, Llama2-13B achieved an F1 score of 87.39 (3.51\% improvement) on the NER task and demonstrated outstanding performance on the RE task with an F1 score of 93.74 (1.15\% improvement). For the TE task, Llama2-7B scored 79.45 (14.26\% improvement), and MedAlpaca-7B achieved the highest F1 score of 93.45 (0.94\% improvement) on the UC task. The ablation study revealed that while MTL increased efficiency with a slight trade-off in performance, RAG significantly boosted overall accuracy. \textbf{Conclusion:} This study presents a novel RAMIE framework that demonstrates substantial improvements in multi-task information extraction for DS-related data from clinical records. Our framework can potentially be applied to other domains.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to extract multiple types of information about dietary supplements (DS) in clinical records. Specifically, the paper aims to develop an advanced multi - task large - language - model (LLM) framework to efficiently extract relevant information about dietary supplements from clinical records. These problems include: 1. **Named Entity Recognition (NER)**: Identify and classify dietary - supplement entities and adverse events (AEs) in the text. For example, in the sentence "The patient reported taking cranberry juice for a urinary tract infection", the model needs to mark "cranberry juice" as a dietary supplement and "urinary tract infection" as an adverse event. 2. **Relation Extraction (RE)**: Determine the relationships between the identified entities. For example, in the sentence "The patient experienced nausea after taking ginseng", the model needs to identify the negative relationship between "ginseng" and "nausea". 3. **Triple Extraction (TE)**: Structure the information into "subject - predicate - object" triples. For example, in the sentence "Cranberry is used to prevent urinary tract infections", the model needs to extract the triple (Cranberry, has_indication, urinary tract infections). 4. **Usage Classification (UC)**: Classify the usage status (such as start, continue, stop or uncertain) of dietary supplements described in clinical records. For example, in the sentence "The patient stopped taking fish oil due to side effects", the model needs to classify the usage status as "stop". ### Background and Challenges Dietary supplements play an important role in promoting health and wellness, but there are many problems with their quality and safety. Since dietary supplements are classified as food rather than drugs, they are not strictly regulated by the FDA, which leads to insufficient ingredient transparency, lack of rigorous clinical trials and mechanism research, and thus may cause adverse events. Clinical records contain a large amount of information about dietary supplements and their adverse events, which is of great value for public health, medical research and regulation. However, this information is usually embedded in the unstructured text of electronic health records, and advanced information - extraction methods are required to comprehensively and accurately identify relevant entities, events and their relationships. ### Limitations of Existing Research Although some studies have attempted to use natural - language - processing (NLP) techniques to analyze dietary supplements in text, these methods still have limitations when dealing with complex clinical texts and multiple entity types and relationships. For example, Bi - LSTM and BERT models perform poorly when dealing with unseen texts or complex clinical texts. Recently, large - scale language models (LLMs) such as GPT and Llama series have made significant progress in the field of artificial intelligence and have shown effectiveness in health - record and information - extraction tasks. However, the application of these models in dietary - supplement - related information extraction is still in the exploration stage. ### Main Contributions of the Paper 1. **First Exploration**: This is the first exploration of the potential of LLMs in multi - task information extraction of dietary supplements, covering NER, RE, TE and UC tasks. 2. **Proposing the RAMIE Framework**: A retrieval - enhanced multi - task information - extraction framework (RAMIE) is proposed, which improves extraction accuracy, model efficiency and scalability through multi - task learning (MTL), retrieval - enhanced generation (RAG) and instruction fine - tuning. 3. **Comprehensive Experiments**: On 8 state - of - the - art...

RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

LLMs in Biomedicine: A study on clinical Named Entity Recognition

RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models

MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models?

Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models

From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents

Evaluating approaches of training a generative large language model for multi-label classification of unstructured electronic health records

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Fine-tuning large language models for effective nutrition support in residential aged care: a domain expertise approach

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning

Empowering PET Imaging Reporting with Retrieval-Augmented Large Language Models and Reading Reports Database: A Pilot Single Center Study

LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing