Beyond Tokens: Fair Evaluation of French Large Language Models for Clinical Named Entity Recognition

Jamil Zaghir,Mina Bjelogrlic,Jean-Philippe Goldman,Adel Bensahla,Yuanyuan Zheng,Christian Lovis
DOI: https://doi.org/10.3233/SHTI240502
2024-08-22
Abstract:Named Entity Recognition (NER) models based on Transformers have gained prominence for their impressive performance in various languages and domains. This work delves into the often-overlooked aspect of entity-level metrics and exposes significant discrepancies between token and entity-level evaluations. The study utilizes a corpus of synthetic French oncological reports annotated with entities representing oncological morphologies. Four different French BERT-based models are fine-tuned for token classification, and their performance is rigorously assessed at both token and entity-level. In addition to fine-tuning, we evaluate ChatGPT's ability to perform NER through prompt engineering techniques. The findings reveal a notable disparity in model effectiveness when transitioning from token to entity-level metrics, highlighting the importance of comprehensive evaluation methodologies in NER tasks. Furthermore, in comparison to BERT, ChatGPT remains limited when it comes to detecting advanced entities in French.
What problem does this paper attempt to address?