MKGB: A Medical Knowledge Graph Construction Framework Based on Data Lake and Active Learning

Peng Ren,Wei Hou,Ming Sheng,Xin Li,Chao Li,Yong Zhang
DOI: https://doi.org/10.1007/978-3-030-90885-0_22
2021-01-01
Abstract:Medical knowledge graph (MKG) provides ideal technical support for integrating multi-source heterogeneous data and enhancing graph-based services. These multi-source data are usually huge, heterogeneous, and difficult to manage. To ensure that the generated MKG has higher quality, the construction of MKG using these data requires a large number of medical experts to participate in the annotation based on their expertise. However, faced with such a large amount of data, manual annotation turns out to be a high labor cost task. In addition, the medical data are generated rapidly, which requires us to manage and annotate efficiently to keep up with the pace of data accumulation. Prior researches lacked efficient data management for massive medical data, and few studies focused on the construction of large-scale and high-quality MKG. We propose a M edical K nowledge G raph B uilder (MKGB) based on Data Lake and active learning, which is used to solve the problems mentioned above. There are four modules in MKGB, data acquiring module, data management framework module based on Data Lake, active learning module for reducing labor cost and MKG construction module. With the efficient management for extensive medical data in data management framework based on Data Lake, MKGB uses active learning based on doctor-in-the-loop idea to reduce the labor cost of annotation process, while ensuring the quality of annotation and enabling the construction of large-scale and high-quality MKG. Based on the efficient data management, we demonstrate that our approach significantly reduces the cost of manual annotation and generates more reliable MKG.
What problem does this paper attempt to address?