A unified framework of medical information annotation and extraction for Chinese clinical text
Enwei Zhu,Qilin Sheng,Huanwan Yang,Yiyang Liu,Ting Cai,Jinpeng Li
DOI: https://doi.org/10.1016/j.artmed.2023.102573
IF: 7.011
2023-05-21
Artificial Intelligence in Medicine
Abstract:Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
engineering, biomedical,computer science, artificial intelligence,medical informatics