A Similarity Algorithm Based on the Generality and Individuality of Words

Yinfeng Zou, Chunping Ouyang, Yongbin Liu, Xiaohua Yang
DOI: https://doi.org/10.1007/978-3-319-50496-4_48
2016-01-01
Abstract:HowNet is a popular platform of Chinese text similarity calculation. The study has found that there is still some short-comings about the effect of "HowNet" architecture, the organization of vocabulary, concept description on word similarity measurement. In hence, on the basis of analyzing the generality and individuality of words in "HowNet", a similarity algorithm based on the generality and individuality of words is proposed. Furthermore, experimental data is from NLPCC-ICCPOL 2016 Chinese words similarity evaluation task data set. Experimental results show that the algorithm is more feasible and stable, and better than some of the other classic algorithms. Moreover, the size of experimental data sets has a little influence on experimental results. In all experiments, the Pearson correlation coefficient and the Spearman's coefficient have stably reached 0.460 and 0.440.
What problem does this paper attempt to address?