Abstract:Background: Named entity recognition (NER) models are essential for extracting structured information from unstructured medical texts by identifying entities such as diseases, treatments, and conditions, enhancing clinical decision-making and research. Innovations in machine learning, particularly those involving Bidirectional Encoder Representations From Transformers (BERT)–based deep learning and large language models, have significantly advanced NER capabilities. However, their performance varies across medical datasets due to the complexity and diversity of medical terminology. Previous studies have often focused on overall performance, neglecting specific challenges in medical contexts and the impact of macrofactors like lexical composition on prediction accuracy. These gaps hinder the development of optimized NER models for medical applications. Objective: This study aims to meticulously evaluate the performance of various NER models in the context of medical text analysis, focusing on how complex medical terminology affects entity recognition accuracy. Additionally, we explored the influence of macrofactors on model performance, seeking to provide insights for refining NER models and enhancing their reliability for medical applications. Methods: This study comprehensively evaluated 7 NER models—hidden Markov models, conditional random fields, BERT for Biomedical Text Mining, Big Transformer Models for Efficient Long-Sequence Attention, Decoding-enhanced BERT with Disentangled Attention, Robustly Optimized BERT Pretraining Approach, and Gemma—across 3 medical datasets: Revised Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA), BioCreative V CDR, and Anatomical Entity Mention (AnatEM). The evaluation focused on prediction accuracy, resource use (eg, central processing unit and graphics processing unit use), and the impact of fine-tuning hyperparameters. The macrofactors affecting model performance were also screened using the multilevel factor elimination algorithm. Results: The fine-tuned BERT for Biomedical Text Mining, with balanced resource use, generally achieved the highest prediction accuracy across the Revised JNLPBA and AnatEM datasets, with microaverage (AVG_MICRO) scores of 0.932 and 0.8494, respectively, highlighting its superior proficiency in identifying medical entities. Gemma, fine-tuned using the low-rank adaptation technique, achieved the highest accuracy on the BioCreative V CDR dataset with an AVG_MICRO score of 0.9962 but exhibited variability across the other datasets (AVG_MICRO scores of 0.9088 on the Revised JNLPBA and 0.8029 on AnatEM), indicating a need for further optimization. In addition, our analysis revealed that 2 macrofactors, entity phrase length and the number of entity words in each entity phrase, significantly influenced model performance. Conclusions: This study highlights the essential role of NER models in medical informatics, emphasizing the imperative for model optimization via precise data targeting and fine-tuning. The insights from this study will notably improve clinical decision-making and facilitate the creation of more sophisticated and effective medical NER models.

Improving Downstream Task Performance by Treating Numbers as Entities

Estimating Numbers without Regression

Do NLP Models Know Numbers? Probing Numeracy in Embeddings.

Pre-training and Evaluation of Numeracy-Oriented Language Model.

An Effective Framework to Help Large Language Models Handle Numeric-involved Long-context Tasks

Number Cookbook: Number Understanding of Language Models and How to Improve It

How to Leverage Digit Embeddings to Represent Numbers?

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

Multi-objective Representation for Numbers in Clinical Narratives Using CamemBERT-bio

Floating-Point Embedding: Enhancing the Mathematical Comprehension of Large Language Models

Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia

Enhancing Seq2seq Math Word Problem Solver with Entity Information and Math Knowledge

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

Improving Numerical Reasoning Skills in the Modular Approach for Complex Question Answering on Text

Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained Language Models

IMPROVING NER IN SOCIAL MEDIA VIA ENTITY TYPE-COMPATIBLE UNKNOWN WORD SUBSTITUTION

Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study

Numeral Understanding in Financial Tweets for Fine-Grained Crowd-Based Forecasting

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

Interleaving Text and Number Embeddings to Solve Mathemathics Problems

Laying Anchors: Semantically Priming Numerals in Language Modeling