Abstract:Background: The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model's comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non-English speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes. Objective: In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents. Methods: Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in seven finetuning tasks. Results: The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks utilizing [CLS] token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with [UNK] tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41. Conclusions: This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.

Turkish Medical Text Classification Using BERT

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

Turkish Text Classification: From Lexicon Analysis to Bidirectional Transformer

Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation

KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

Accurate Medical Named Entity Recognition Through Specialized NLP Models

Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks

Improving Cancer Hallmark Classification with BERT-based Deep Learning Approach

New Arabic Medical Dataset for Diseases Classification

LegalTurk Optimized BERT for Multi-Label Text Classification and NER

Comparison of Pre-trained Language Models for Turkish Address Parsing

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

Medical Text Classification Based on an Optimized Machine Learning and External Semantic Resource

BERT2D: Two Dimensional Positional Embeddings for Efficient Turkish NLP

Optimizing classification of diseases through language model analysis of symptoms

Token Classification for Disambiguating Medical Abbreviations

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts

Medical Dataset Classification for Kurdish Short Text over Social Media

Comparative Analysis of Text Classification Approaches in Electronic Health Records

Comparison of Classification Algorithms Used Medical Documents Categorization