Traditional Chinese Medicine Symptom Normalization Approach Leveraging Hierarchical Semantic Information and Text Matching with Attention Mechanism

Qi Jia,Dezheng Zhang,Shibing Yang,Chao Xia,Yingjie Shi,Hu Tao,Cong Xu,Xiong Luo,Yuekun Ma,Yonghong Xie
DOI: https://doi.org/10.1016/j.jbi.2021.103718
IF: 8
2021-01-01
Journal of Biomedical Informatics
Abstract:Traditional Chinese medicine (TCM) symptom normalization is difficult because the challenges of the symptoms having different literal descriptions, one-to-many symptom descriptions and different symptoms sharing a similar literal description. We propose a novel two-step approach utilizing hierarchical semantic information that represents the functional characteristics of symptoms and develop a text matching model that integrates hierarchical semantic information with an attention mechanism to solve these problems. In this study, we constructed a symptom normalization dataset and a TCM normalization symptom dictionary containing normalization symptom words, and assigned symptoms into 24 classes of functional characteristics. First, we built a multi-label text classifier to isolate the hierarchical semantic information from each symptom description and count the corresponding normalization symptoms and filter the candidate set. Then we designed a text matching model of mixed multi-granularity language features with an attention mechanism that utilizes the hierarchical semantic information to calculate the matching score between the symptom description and the normalization symptom words. We compared our approach with other baselines on real-world data. Our approach gives the best performance with a Hit@ 1, 3, and 10 of 0.821, 0.953, and 0.993, respectively, and a MeanRank of 1.596, thus outperforming significantly regarding the symptom normalization task. We developed an approach for the TCM symptom normalization task and demonstrated its superior performance compared with other baselines, indicating the promise of this research direction.
What problem does this paper attempt to address?