Topic model based medical record classification method

Zhang Yin,Li Zherong,Yao Liang,Wei Baogang
2017-01-01
Abstract:The invention discloses a topic model based medical record classification method, which comprises the steps of 1) extracting single medical records from medical record books through OCR and text structured processing; 2) performing preprocessing on all structured single medical records by using a Chinese word segmentation tool, wherein the preprocessing comprises word segmentation and top word removing; 3) filtering word segmentation results of the medical records by respectively using domain dictionaries for traditional Chinese medicines, prescriptions, diseases, symptoms, syndromes and treatment methods, and acquiring six word lists corresponding to each medical record; 4) building a medical record topic model containing the following seven categories: common words, traditional Chinese medicines, prescriptions, diseases, symptoms, syndromes and treatment methods; 5) inputting the six word lists, which are acquired by filtering in the step 3), of each medical record and words left in the medical records into the topic model to train, and acquiring document topic distribution through Gibbs sampling; and 6) inputting the document topic distribution into a trained SVM classifier so as to acquire corresponding categories.
What problem does this paper attempt to address?