Abstract:Electronic Medical Records (EMRs) are written in an unstructured way, often using natural language. Information Extraction (IE) may be used for acquiring knowledge from such texts, including the automatic recognition of meaningful entities, through models for Named Entity Recognition (NER). However, while most work on the previous was made for English, this experience aimed at testing different methods in Portuguese text, more precisely, on the domain of Neurology, and take some conclusions. This paper comprised the comparison between Conditional Random Fields (CRF), bidirectional Long Short-term Memory - Conditional Random Fields (BiLSTM-CRF) and a BiLSTM-CRF with residual learning connections, using not only Portuguese texts from medical journals but also texts from the Coimbra Hospital and Universitary Centre (CHUC) Neurology Service. Furthermore, the performances of BiLSTM-CRF models using word embeddings (WEs) trained with clinical text and WEs trained with general language texts were compared. Deep learning models achieved F1-Scores of nearly 83% and 75%, respectively for relaxed and strict evaluation, on texts extracted from the medical journal. For texts collected from the Hospital, the same achieved F1-Scores of nearly 71% and 62%. This work concludes that deep learning models outperform the shallow learning models and that in-domain WEs get better results than general language WEs, even when the latter are trained with much more text than the former. Furthermore, the results show that it is possible to extract information from Hospital clinical texts with models trained with clinical cases extracted from medical journals, and thus openly available. Nevertheless, such results still require a healthcare technician to check if the information is well extracted.

Evaluating Named Entity Recognition: Comparative Analysis of Mono- and Multilingual Transformer Models on Brazilian Corporate Earnings Call Transcriptions

Evaluating Named Entity Recognition: A comparative analysis of mono- and multilingual transformer models on a novel Brazilian corporate earnings call transcripts dataset

Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text

Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

Portuguese Named Entity Recognition using BERT-CRF

Beyond Tokens: Fair Evaluation of French Large Language Models for Clinical Named Entity Recognition

Data augmentation and transfer learning for cross-lingual Named Entity Recognition in the biomedical domain

From Brazilian Portuguese to European Portuguese

A Benchmark Evaluation of Clinical Named Entity Recognition in French

SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature

Multilingual Fine-Grained Named Entity Recognition

Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

MediAlbertina: An European Portuguese medical language model

Evaluation of transformer models for financial targeted sentiment analysis in Spanish

Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems

Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

Portuguese FAQ for Financial Services

An Experimental Study on Data Augmentation Techniques for Named Entity Recognition on Low-Resource Domains

Transformer-based approach for symptom recognition and multilingual linking