Evaluating Semantic Similarity Between Chinese Biomedical Terms Through Multiple Ontologies with Score Normalization: an Initial Study.

Wenxin Ning,Ming Yu,Dehua Kong
DOI: https://doi.org/10.1016/j.jbi.2016.10.017
IF: 8
2016-01-01
Journal of Biomedical Informatics
Abstract:Background: Semantic similarity estimation significantly promotes the understanding of natural language resources and supports medical decision making. Previous studies have investigated semantic similarity and relatedness estimation between biomedical terms through resources in English, such as SNOMED-CT or LIMB. However, very limited studies focused on the Chinese language, and technology on natural language processing and text mining of medical documents in China is urgently needed. Due to the lack of a complete and publicly available biomedical ontology in China, we only have access to several modest-sized ontologies with no overlaps. Although all these ontologies do not constitute a complete coverage of biomedicine, their coverage of their respective domains is acceptable. In this paper, semantic similarity estimations between Chinese biomedical terms using these multiple non-overlapping ontologies were explored as an initial study.Methods: Typical path-based and information content (IC)-based similarity measures were applied on these ontologies. From the analysis of the computed similarity scores, heterogeneity in the statistical distributions of scores derived from multiple ontologies was discovered. This heterogeneity hampers the comparability of scores and the overall accuracy of similarity estimation. This problem was addressed through a novel language-independent method by combining semantic similarity estimation and score normalization. A reference standard was also created in this study.Results: Compared with the existing task-independent normalization methods, the newly developed method exhibited superior performance on most IC-based similarity measures. The accuracy of semantic similarity estimation was enhanced through score normalization. This enhancement resulted from the mitigation of heterogeneity in the similarity scores derived from multiple ontologies.Conclusion: We demonstrated the potential necessity of score normalization when estimating semantic similarity using ontology -based measures. The results of this study can also be extended to other language systems to implement semantic similarity estimation in biomedicine. (C) 2016 Elsevier Inc. All rights, reserved.
What problem does this paper attempt to address?