Lexical Characteristics Analysis of Chinese Clinical Documents

Meizhi Ju,Huilong Duan,Haomin Li
DOI: https://doi.org/10.1109/itme.2015.51
2015-01-01
Abstract:Understanding lexical characteristics of clinical documents is the foundation of sublanguage based Medical Language Processing (MLP) approach.However, there are limited studies focused on the lexical characters of Chinese clinical documents.In this study, a lexical characteristics analysis on both syntactic and semantic levels was conducted in a clinical corpus which contains 3,500 clinical documents generated during daily practices.The analysis was based on the automatic tagging results of a lexiconbased part-of-speech (POS) and semantic tagging method.The medical lexicon contains 237,291 entries annotated with both semantic and syntactic classes.The normalized frequency of different terms, syntactic and semantic classes was calculated and visualized.Major contribution of this paper is providing a wide-coverage Chinese medical semantic lexicon and presenting the lexical characteristics of Chinese clinical documents.Both of these will lay a good foundation for sublanguage based MLP studies in China.
What problem does this paper attempt to address?