Lossy JBIG2 Based on Optical Character Recognition

SHANG Junqing,LIU Changsong,DING Xiaoqing
DOI: https://doi.org/10.3321/j.issn:1000-0054.2006.07.017
2006-01-01
Abstract:Bi-level image coding is useful for document storage and archiving,image searches on the Internet and digital libraries.The JBIG2(joint bi-level image group) standard for lossless and lossy coding of bi-level images is a very flexible encoding strategy which allows researchers to design their own encoders. OCR processing of text images is one encoding technique that gives measurable recognition and the confidence results.We propose a lossy JBIG2 encoding method which uses OCR processing results to improve text image compression based on pattern matching.All the credible recognized characters in the image are replaced by representative character images so that the encoder only needs to mark the positions of these characters.Experiment results show that this method gives better results than previous JBIG2 encoding methods with 14.3% less storage compared to previous lossless methods while preserving relatively good text image quality.
What problem does this paper attempt to address?