Abstract:Objective: Clinical knowledge enriched transformer models (e.g., ClinicalBERT) have state-of-the-art results on clinical NLP (natural language processing) tasks. One of the core limitations of these transformer models is the substantial memory consumption due to their full self-attention mechanism, which leads to the performance degradation in long clinical texts. To overcome this, we propose to leverage long-sequence transformer models (e.g., Longformer and BigBird), which extend the maximum input sequence length from 512 to 4096, to enhance the ability to model long-term dependencies in long clinical texts. Materials and Methods: Inspired by the success of long sequence transformer models and the fact that clinical notes are mostly long, we introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus. We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks. Results: The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT and other short-sequence transformers in all 10 downstream tasks and achieve new state-of-the-art results. Discussion: Our pre-trained language models provide the bedrock for clinical NLP using long texts. We have made our source code available at <a class="link-external link-https" href="https://github.com/luoyuanlab/Clinical-Longformer" rel="external noopener nofollow">this https URL</a>, and the pre-trained models available for public download at: <a class="link-external link-https" href="https://huggingface.co/yikuan8/Clinical-Longformer" rel="external noopener nofollow">this https URL</a>. Conclusion: This study demonstrates that clinical knowledge enriched long-sequence transformers are able to learn long-term dependencies in long clinical text. Our methods can also inspire the development of other domain-enriched long-sequence transformers.

Boosting classification reliability of NLP transformer models in the long run

Fine-tuning large neural language models for biomedical natural language processing

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction.

BERTer: The Efficient One

Single task fine-tune BERT for text classification

Understanding Transformers for Bot Detection in Twitter

A Comparative Study of Pretrained Language Models for Long Clinical Text

An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text

Hierarchical Transformers for Long Document Classification

Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks

Fine-Tuning Large Language Models for Scientific Text Classification: A Comparative Study

Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge

A new computationally efficient method to tune BERT networks – transfer learning

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

Limitations of Transformers on Clinical Text Classification

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports

Training BERT Models to Carry Over a Coding System Developed on One Corpus to Another

Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter

Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately