Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models

Mohamed El Azzouzi,Gouenou Coatrieux,Reda Bellafqira,Denis Delamarre,Christine Riou,Naima Oubenali,Sandie Cabon,Marc Cuggia,Guillaume Bouzillé
DOI: https://doi.org/10.1186/s12911-024-02422-5
IF: 3.298
2024-02-18
BMC Medical Informatics and Decision Making
Abstract:Electronic health records (EHRs) contain valuable information for clinical research; however, the sensitive nature of healthcare data presents security and confidentiality challenges. De-identification is therefore essential to protect personal data in EHRs and comply with government regulations. Named entity recognition (NER) methods have been proposed to remove personal identifiers, with deep learning-based models achieving better performance. However, manual annotation of training data is time-consuming and expensive. The aim of this study was to develop an automatic de-identification pipeline for all kinds of clinical documents based on a distant supervised method to significantly reduce the cost of manual annotations and to facilitate the transfer of the de-identification pipeline to other clinical centers.
medical informatics
What problem does this paper attempt to address?