Keyword searching in compressed document images

Yue Lu,Chew Lim Tan
DOI: https://doi.org/10.1109/DCC.2003.1194056
2003-01-01
Abstract:Summary form only given. A compressed pattern matching method for searching keywords from the CCIT group 4-compressed document images, without explicit decompression, is presented. According to the CCIT Group 4 standards, each coded position indicates current pixel color is different from its previous pixel, except for the next coded positions of the pass mode. The changing elements from the compressed images are extracted and are then utilized to segment and bound the word objects and to measure the similarity of two word images. A two-stage matching strategy is constructed to measure the dissimilarity between the template image of the user's query word and the word extracted from document images. Experiments were conducted to verify the validity of the approach. The results show that the proposed approach was much faster than the traditional approach, because it avoids the pixel-level processing for analyzing the connected components and extracting word features.
What problem does this paper attempt to address?