Abstract:Objective: To explore and compare the performance of ChatGPT and other state-of-the-art LLMs on domain-specific NER tasks covering different entity types and domains in TCM against COVID-19 literature. Methods: We established a dataset of 389 articles on TCM against COVID-19, and manually annotated 48 of them with 6 types of entities belonging to 3 domains as the ground truth, against which the NER performance of LLMs can be assessed. We then performed NER tasks for the 6 entity types using ChatGPT (GPT-3.5 and GPT-4) and 4 state-of-the-art BERT-based question-answering (QA) models (RoBERTa, MiniLM, PubMedBERT and SciBERT) without prior training on the specific task. A domain fine-tuned model (GSAP-NER) was also applied for a comprehensive comparison. Results: The overall performance of LLMs varied significantly in exact match and fuzzy match. In the fuzzy match, ChatGPT surpassed BERT-based QA models in 5 out of 6 tasks, while in exact match, BERT-based QA models outperformed ChatGPT in 5 out of 6 tasks but with a smaller F-1 difference. GPT-4 showed a significant advantage over other models in fuzzy match, especially on the entity type of TCM formula and the Chinese patent drug (TFD) and ingredient (IG). Although GPT-4 outperformed BERT-based models on entity type of herb, target, and research method, none of the F-1 scores exceeded 0.5. GSAP-NER, outperformed GPT-4 in terms of F-1 by a slight margin on RM. ChatGPT achieved considerably higher recalls than precisions, particularly in the fuzzy match. Conclusions: The NER performance of LLMs is highly dependent on the entity type, and their performance varies across application scenarios. ChatGPT could be a good choice for scenarios where high recall is favored. However, for knowledge acquisition in rigorous scenarios, neither ChatGPT nor BERT-based QA models are off-the-shelf tools for professional practitioners.

Astro-NER -- Astronomy Named Entity Recognition: Is GPT a Good Domain Expert Annotator?

Textual Data Augmentation for NER in Geosciences with LLMs

AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model

Astronomical Knowledge Entity Extraction in Astrophysics Journal Articles via Large Language Models

AstroLLaMA: Towards Specialized Foundation Models in Astronomy

GPT-NER: Named Entity Recognition via Large Language Models

AstroMLab 1: Who Wins Astronomy Jeopardy!?

Large Language Models and Knowledge Graphs for Astronomical Entity Disambiguation

ASTRAL: Adversarial Trained LSTM-CNN for Named Entity Recognition

NanoNER: Named Entity Recognition for nanobiology using experts' knowledge and distant supervision

Harnessing the Power of Adversarial Prompting and Large Language Models for Robust Hypothesis Generation in Astronomy

A GPT-assisted iterative method for extracting domain knowledge from a large volume of literature of electromagnetic wave absorbing materials with limited manually annotated data

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

What is the Role of Large Language Models in the Evolution of Astronomy Research?

GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity Extraction Focused on Machine Learning Models and Datasets

Utilizing Large Language Models for Named Entity Recognition in Traditional Chinese Medicine against COVID-19 Literature: Comparative Study

AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy

Development of a Language Model for Named-Entity-Recognition in Aerospace Requirements

Computer Science Named Entity Recognition in the Open Research Knowledge Graph

Incorporating Large Language Models into Named Entity Recognition: Opportunities and Challenges