A Document Image Quality Assessment Algorithm Based on Information Entropy in Text Region

Zongrui Zhang,Jian Qiu,Hao He
DOI: https://doi.org/10.1007/978-3-031-20738-9_72
2023-01-01
Abstract:The quality of the image is critical to Optical Character Recognition (OCR), poor quality images will lead OCR to generate unreliable results. There are relative high ratio of low quality images in practical OCR-based application scenarios, how to evaluate quality of image and filter out unqualified images by document image quality assessment (DIQA) algorithms effectively is a big challenge for these scenarios. Current DIQA algorithms mainly focus on the overall image features rather than the text region, while the quality of the text region is dominant factor for OCR. In this paper, we propose a document image quality assessment algorithm based on information entropy in text region of image. Our algorithmic framework mainly consists of three networks to detect, extract and evaluate text region in image respectively. We build a quality prediction network based on HyperNet, and use the information entropy of the text region as the score weight, so that the final score can reflect the quality of the text region better. Finally, testing results on benchmark dataset SmartDoc-QA and our constructed dataset DocImage1k demonstrate that the proposed algorithm achieves excellent performance.
What problem does this paper attempt to address?