AHIAP - An Agile Medical Named Entity Recognition and Relation Extraction Framework Based on Active Learning.

Ming Sheng,Jing Dong,Yong Zhang,Yuelin Bu,Anqi Li,Weihang Lin,Xin Li,Chunxiao Xing
DOI: https://doi.org/10.1007/978-3-030-61951-0_7
2020-01-01
Abstract:Knowledge graph plays a significant role in many domains for providing a wide range of assistance. In the medical domain, clinical guidelines, academic papers, Electronic Medical Records (EMRs) and crawled data from the Internet contain essential information. However, those data are usually unstructured but vital to knowledge graph construction. The construction of knowledge graph using unstructured data requires a large number of medical experts to participate in annotations based on their prior experiences and knowledge. Knowledge graphs’ quality highly depends on the performances of medical named entity recognition and relation extraction that are both based on data annotation. However, faced with handling such a large amount of enormous data, manual labelling turns out to be a high labor cost task. Besides, the data is generated rapidly, requiring us to annotate and extract quickly to keep the pace with the data accumulation. Therefore, we propose a named entity recognition and relation extraction framework, AHIAP, to solve these problems mentioned above. AHIAP uses active learning method to reduce the labor cost of the annotation process while maintaining the annotation quality. There are two modules in AHIAP, an active learning module for reducing labor cost and a measurement module to control the quality. By using active learning, AHIAP only takes 200 samples to get to the accuracy of 70%, whereas the standard learning strategy takes 4000 records to get the same accuracy.
What problem does this paper attempt to address?