Semi-Supervised Patient Similarity Clustering Algorithm Based on Electronic Medical Records

Jiao Zhang,Dan Chang
DOI: https://doi.org/10.1109/access.2019.2923333
IF: 3.9
2019-01-01
IEEE Access
Abstract:Electronic medical record (EMR) is the comprehensive description of the patients' individual health information in medical activities, which paves the way for intelligent-assisted medical decision-making research. However, due to the problems of data preprocessing, time-consuming and laborious data labeling in the Chinese electronic medical records, and the diversity of electronic medical record data storage, the research on electronic medical records poses certain challenges. Therefore, this paper intends to construct a medical domain dictionary in the word segmentation to improve the accuracy of the Chinese medical terminology recognition in electronic medical records. The Chinese text feature is extracted by using the latent Dirichl et al location (LDA) model, and the patient feature vector is constructed by feature stitching. At the same time, this paper uses the cumulative method of multi-impact indicators to construct the patient paired constraint set as the supervision information and guides the cluster learning model to optimize the patient category effect. In this paper, the comparison results show that the clustering algorithm with supervisory information is better than the simple unsupervised clustering algorithm. When constructing the supervised information set, the influence factor of the multi-dimensional attribute of the patient is better than the single-diagnosis result factor, and the clustering effect is improved as the number of paired constraint information increases.
What problem does this paper attempt to address?