Abstract:Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In addition, we explore different ensemble learning methods (\textsc{Stack-Ensemble} and \textsc{Voting-Ensemble}) to augment the model performances from individual LLMs. Our ensemble learning result demonstrated better performances than individually fine-tuned base models BERT, RoBERTa, RoBERTa-L, BioBERT, BioClinicalBERT, BioMedRoBERTa, ClinicalBERT, and PubMedBERT across general and specific domains. Finally, we build up an entity linking function to map extracted medical terminologies into the SNOMED-CT codes and the British National Formulary (BNF) codes, which are further mapped to the Dictionary of Medicines and Devices (dm+d), and ICD. Our model's toolkit and desktop applications are publicly available at \url{<a class="link-external link-https" href="https://github.com/HECTA-UoM/ensemble-NER" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is in the field of medical natural language processing (NLP), how to effectively extract drugs and their related attributes (such as dosage, route of administration, strength, and side effects, etc.) from unstructured text and automatically map them to standard clinical knowledge bases (such as SNOMED - CT, BNF, etc.). Specifically, the paper explores the following aspects: 1. **Drug Information Extraction and Mining**: - Extract drug names and their related attributes (dosage, route of administration, strength, side effects, frequency, duration, dosage form, and reason, etc.) in medical texts. - Automatically map these extracted terms to standard clinical terminologies (such as SNOMED - CT and BNF) to achieve automated clinical coding. 2. **Model Performance Improvement**: - Research the performance of the state - of - the - art large language models (LLMs) in drug information extraction tasks. - Explore different ensemble learning methods (such as STACK - ENSEMBLE and VOTING - ENSEMBLE) to enhance the performance of a single LLM. 3. **Application of Ensemble Learning**: - Improve the accuracy of named entity recognition (NER) tasks by integrating multiple pre - trained language models (such as BERT, RoBERTa, BioBERT, ClinicalBERT, etc.). - Compare the effects of different integration strategies (voting and stacking) and evaluate their performance on clinical texts. 4. **Entity Linking Function**: - Build an entity linking function to map the extracted medical terms to SNOMED - CT codes and British National Formulary (BNF) codes, and further map them to the Dictionary of Medicines and Devices (dm + d) and International Classification of Diseases (ICD). 5. **User Tool Development**: - Develop desktop applications and Web interfaces to enable users to conveniently use these models for drug information extraction and entity linking. ### Formula Summary The formulas involved in the paper are mainly used to evaluate model performance, mainly including the following metrics: - **Precision**: \[ P=\frac{TP}{TP + FP} \] where \(TP\) is true positive and \(FP\) is false positive. - **Recall**: \[ R = \frac{TP}{TP+FN} \] where \(FN\) is false negative. - **F1 Score**: \[ F1=2\times\frac{P\times R}{P + R} \] - **Accuracy**: \[ Acc=\frac{TP + TN}{TP+TN + FP+FN} \] where \(TN\) is true negative. Through these metrics, the paper evaluates the performance of different models and integration methods in drug information extraction tasks and demonstrates the effectiveness of ensemble learning methods.

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning

Advancing entity recognition in biomedicine via instruction tuning of large language models

LLMs in Biomedicine: A study on clinical Named Entity Recognition

Extraction of Medication and Temporal Relation from Clinical Text using Neural Language Models

MedMine: Examining Pre-trained Language Models on Medication Mining

Intent Detection and Entity Extraction from BioMedical Literature

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition

Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction

NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval

A Study of Deep Learning Approaches for Medication and Adverse Drug Event Extraction from Clinical Text.

Energetics of temperature regulation and foraging in a bumblebee,Bombus terricola kirby

Hybrid X-Linker: Automated Data Generation and Extreme Multi-label Ranking for Biomedical Entity Linking

Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods

MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking

A Lightweight Neural Model for Biomedical Entity Linking

Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study