A Deep Learning-Based System for the MEDDOCAN Task

Dehuan Jiang,Yedan Shen,Shuai Chen,Buzhou Tang,Xiaolong Wang,Qingcai Chen,Ruifeng Xu,Jun Yan,Yi Zhou
2019-01-01
Abstract:Due to privacy constraints, de-identification, identifying and removing all PHI mentions, is a prerequisite for accessing and sharing clinical records outside of hospitals. Large quantities of studies on de-identification have been conducted in recent years, especially with the efforts of i2b2 (the Center of Informatics for Integrating Biology and Bedside). The i2b2 community has organized challenges about de-identification for clinical text in English many times. In 2019, Martin Krallinger et al. organized a challenge task specifically devoted to the anonymization of medical documents in Spanish, called the MEDDOCAN (Medical Document Anonymization) task. We participated in and developed a deep learning-based system for the MEDDOCAN task. Our system was developed on a training set of 500 records and a development set of 250 records. Evaluation on a test set of 250 shows that our system achieved a “strict” F1-score of 0.9646 at entity level, a “strict” F1-score of 0.97 at span level and a “merged” F1-score of 0.9821 at span level.
What problem does this paper attempt to address?