Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

Junyu Luo,Zifei Zheng,Hanzhong Ye,Muchao Ye,Yaqing Wang,Quanzeng You,Cao Xiao,Fenglong Ma
2023-09-22
Abstract:Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of automatically simplifying expert language into language that can be understood by the general public in the clinical field. Although some studies have proposed automatic translation methods, most of these methods mainly focus on either accuracy or readability, failing to optimize both aspects simultaneously. Therefore, the simplification of clinical language remains a challenging task and has not been fully resolved. Specifically, the paper points out that patients often find it difficult to understand medical terminology and the complex structure of professional medical language due to low health literacy. While some studies have proposed automatic translation methods, most of these methods do not consider both accuracy and readability simultaneously. Additionally, existing datasets and models have limitations in handling the simplification of clinical language, such as small dataset sizes and a lack of term-level annotations. Therefore, the paper aims to address these issues by constructing new datasets and proposing new models to improve the accuracy and readability of clinical language simplification.