EXTRACTING INFORMATION FROM CHINESE PRESCRIPTION PHARMACEUTICALS BASED ON NPOS SHORTEST-PATH WORD SEGMENTATION ALGORITHM

Shi Shaomin,Yang Yan,Wei Baogang
DOI: https://doi.org/10.3969/j.issn.1000-386X.2010.09.048
2010-01-01
Abstract:Based on extracting composition information of Chinese prescription pharmaceuticals in (TCM),this paper describes a method of breaking through the restriction of unstructured data in the process of informatisation of traditional Chinese medicine and pharmaceuticals sector,and introduces major technical procedures of text information extraction.The framework of information extraction presented covers these areas including building customized herbal medicine dictionary,locating the Chinese prescription information by page,segmenting the Chinese prescription composition text,as well as extracting the pharmaceuticals composition information,etc.The extraction of Chinese prescription information bases mainly on regular expression,and takes into account the particularity of the books of traditional Chinese medicine and pharmaceuticals.The word segmentation of composition text uses the shortest-path Chinese word segmentation algorithm based on NPOS model,and the corresponding processes are made in allusion to special circumstances with regard to quantifiers,nouns and verbs in Chinese prescription when extracting the pharmaceuticals.Experiments demonstrate that the method depicted is able to achieve fairly high correct rate in extraction.
What problem does this paper attempt to address?