Construction Of Word Network From Traditional Chinese Medicine Corpus

Hua Cha,Haiming Lu,Tong Yu
DOI: https://doi.org/10.12783/dtcse/itms2016/9460
2016-01-01
Abstract:In this paper, we created an automatic quanticized traditional Chinese medicine (TCM) term network with the measurement of cosine distance. After scanning over the corpus, we got a set of word vectors whose relationships could be measured. After clustering, we obtained a three-level network as a category tree. Leaves stand for different types of words and we got clusters like herbs, diseases, theories of medicine etc. Of all categories, we selected words nearest to the center of each cluster and invited our experts to evaluate whether a word is a correct uncollected TCM term and got a new word extraction rate of around 70%. Our network was almost completely machine-generated so that it is much more efficient and might lead us to several new approaches of TCM with the knowledge from our network.
What problem does this paper attempt to address?