Approach of Structured Information Extraction for Medical Text Data
YANG Bing,NIE Tie-zheng,SHEN De-rong,KOU Yue,YU Ge
DOI: https://doi.org/10.3969/j.issn.1000-1220.2019.07.024
2019-01-01
Abstract:As an important information carrier in the medical field,texts provide important data which support for clinical diagnosis and pathological research. However,texts written with the natural language are often unstructured and difficult for understanding and auto-matic processing. Especially for medical texts in Chinese,due to its strong professionalism,which requires extensive domain knowl-edge,and many short sentences are used in grammar which brings more difficulties for information extraction. Therefore,this paper proposes an approach for extracting structured information from medical text data. This approach firstly uses text clustering and key-words extraction to get commonly used expression terms in medical descriptions,and then generates the medical term database to assist Chinese segmentation to improve quality of segmentation in Chinese medical texts. Then,we analyze semantic dependency between words,and construct syntactic dependency trees for identifying and extracting key indicators with the corresponding value in medical texts from these syntactic dependency trees to obtain the structured output data. We use texts data of medical image reports for experi-ments,and experimental results show that this approach can effectively improve the quality of Chinese word segmentation,with the ac-curacy up to 98. 24% . Moreover,there are significant effects in structured knowledge extraction,with the most accuracy of 83. 76% and recall of 88. 09% . In addition,this approach can cover a variety of dependency grammar,thus has a good applicability.