A Hypothesis on Word Similarity and Its Application.

Peng Jin,Likun Qiu,Xuefeng Zhu,Pengyuan Liu
DOI: https://doi.org/10.1007/978-3-319-14331-6_32
2014-01-01
Abstract:A hypothesis is proposed: the semantic distance between the synonyms or near-synonyms should have the same characteristic as the distance in a metrics space. Metrics space is a set where a notion of distance (called a metric) between elements of the set is defined. At the same time, three properties should be held: (i) Identity of Indiscernibles – the distance is zero if and only if the two elements are the same. (ii) Symmetry – The distance between element A and B is equal to the distance between element B and A. (iii) Triangle Inequality – Given three elements A, B and C, the sum of any two pairs’ distance is greater or equal to the rest one. The first two properties is reasonable intuitively; as to the last one, we first get the word similarities based on HowNet and check whether the synonyms or near-synonyms listed in Cilin Extended Edition can satisfy this property. The experiments show that more than 98.5% triples (consists of three synonyms) satisfy the last property – triangle inequality. Fatherly, we detect a large quantity of thesaurus errors according to our hypothesis.
What problem does this paper attempt to address?