Jbig2 Text Image Compression Based on Ocr

Junqing Shang,Changsong Liu,Xiaoqing Ding
DOI: https://doi.org/10.1117/12.641557
2006-01-01
Abstract:The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.
What problem does this paper attempt to address?