Active learning for medical code assignment

Martha Dais Ferreira,Michal Malyska,Nicola Sahar,Riccardo Miotto,Fernando Paulovich,Evangelos Milios
DOI: https://doi.org/10.48550/arXiv.2104.05741
IF: 5.414
2021-04-12
Machine Learning
Abstract:Machine Learning (ML) is widely used to automatically extract meaningful information from Electronic Health Records (EHR) to support operational, clinical, and financial decision-making. However, ML models require a large number of annotated examples to provide satisfactory results, which is not possible in most healthcare scenarios due to the high cost of clinician-labeled data. Active Learning (AL) is a process of selecting the most informative instances to be labeled by an expert to further train a supervised algorithm. We demonstrate the effectiveness of AL in multi-label text classification in the clinical domain. In this context, we apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset. Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set (8.3\% of the total instances). We conclude that AL methods can significantly reduce the manual annotation cost while preserving model performance.
What problem does this paper attempt to address?