Modeling Local Word Spatial Configurations for Near Duplicate Document Image Retrieval

Li Liu,Yue Lu,Ching Y. Suen,Jinhua Xu
DOI: https://doi.org/10.1109/icdar.2013.54
2013-01-01
Abstract:The issue of near duplicate document image retrieval is addressed in this paper, which is characterized by not only encoding each individual word in the image but also modeling its local spatial configuration. On representing each word in the image as a string in terms of its shape characteristics, a lexicon is first learnt from a training set. Then a word in an arbitrary document image can be soft assigned to a weighted combination of several nearest neighbors in the lexicon. The rationale behind soft-assignment is to tolerate the distortions induced by character segmentations which are error-prone in degraded document images. Most importantly, we look beyond the single word and capture the local spatial configuration for each word which plays a very important role in human perception. It provides much useful information in discriminating between different document images compared with the single word. A graph, benefitting from its great representative power, is built for each word to model its relationships with the neighborhoods locally. The local word spatial configurations are integrated within the inverted file index structure to achieve scalable retrieval. Thus the retrieval of near duplicate document images is formulated as a voting problem. Experimental results on 45,000 document images demonstrate that the proposed approach brings significant improvements in successful retrieval of near duplicate images.
What problem does this paper attempt to address?