An Automatic Approach for Constructing a Knowledge Base of Symptoms in Chinese.

Tong Ruan,Mengjie Wang,Jian Sun,Ting Wang,Lu Zeng,Yichao Yin,Ju Gao
DOI: https://doi.org/10.1186/s13326-017-0145-x
2017-01-01
Journal of Biomedical Semantics
Abstract:While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs of life science in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. Furthermore, expressions used in describing symptoms in clinical practice are diverse, which makes it hard to collect such a KB. In this paper, we publish a public KB of symptoms in Chinese. The KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a large number of EMRs as supplements. As a result, the KB has more than 26,000 distinct symptoms in Chinese including 3,968 symptoms in traditional Chinese medicine (TCM) and 1,029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System (UMLS) and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese.
What problem does this paper attempt to address?