MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

Zhi Wen,Xing Han Lu,Siva Reddy
DOI: https://doi.org/10.18653/v1/2020.clinicalnlp-1.15
2020-12-28
Abstract:One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?