A Multi-Level Matching Method with Hybrid Similarity for Document Retrieval

Haijun Zhang,Tommy W. S. Chow
DOI: https://doi.org/10.1016/j.eswa.2011.08.128
IF: 8.5
2011-01-01
Expert Systems with Applications
Abstract:This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover’s Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms.
What problem does this paper attempt to address?