The earth mover's distance as a semantic measure for document similarity.

Xiaojun Wan,Yuxin Peng
DOI: https://doi.org/10.1145/1099554.1099637
2005-01-01
Abstract:Different words are usually assumed to be semantically independent in most existing similarity measures, which is not often true in practice. The semantic relatedness between words cannot be conveniently employed in the existing measures. We propose a novel similarity measure based on the earth mover's distance (EMD). In the proposed measure, the semantic distances between words are computed based on the electronic lexical database-WordNet and then the EMD is employed to calculate the document similarity with a many-to-many matching between words. Experiments and results demonstrate the effectiveness of the proposed similarity measure.
What problem does this paper attempt to address?