An Efficient Transformer–CNN Network for Document Image Binarization

Lina Zhang,Kaiyuan Wang,Yi Wan

DOI: https://doi.org/10.3390/electronics13122243

IF: 2.9

2024-06-08

Electronics

Abstract:Color image binarization plays a pivotal role in image preprocessing work and significantly impacts subsequent tasks, particularly for text recognition. This paper concentrates on document image binarization (DIB), which aims to separate an image into a foreground (text) and background (non-text content). We thoroughly analyze conventional and deep-learning-based approaches and conclude that prevailing DIB methods leverage deep learning technology. Furthermore, we explore the receptive fields of pre- and post-network training to underscore the Transformer model's advantages. Subsequently, we introduce a lightweight model based on the U-Net structure and enhanced with the MobileViT module to capture global information features in document images better. Given its adeptness at learning both local and global features, our proposed model demonstrates competitive performance on two standard datasets (DIBCO2012 and DIBCO2017) and good robustness on the DIBCO2019 dataset. Notably, our proposed method presents a straightforward end-to-end model devoid of additional image preprocessing or post-processing, eschewing the use of ensemble models. Moreover, its parameter count is less than one-eighth of the model, which achieves the best results on most DIBCO datasets. Finally, two sets of ablation experiments are conducted to verify the effectiveness of the proposed binarization model.

engineering, electrical & electronic,computer science, information systems,physics, applied

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper primarily aims to address the issue of Document Image Binarization (DIB). Specifically, the goal of the paper is to separate the text (foreground) from the background (non-text content) in document images. The purpose of document image binarization is to convert the image into a "black text on white paper" format, i.e., setting the foreground pixel value to 0 and the background pixel value to 255. #### Research Background and Challenges - **Degraded Document Processing**: Ancient document data is often severely degraded, such as yellowing paper and ink contamination. Manually processing a large amount of historical text data is time-consuming, labor-intensive, and prone to errors. - **Limitations of Existing Methods**: Traditional binarization methods (such as the Otsu algorithm, Niblack method, etc.) perform poorly when dealing with low-contrast or unevenly illuminated images. Although deep learning-based methods perform better, they still have shortcomings when dealing with complex background textures. #### Proposed Method - **Combining U-Net and Transformer**: A lightweight model based on the U-Net structure and incorporating the MobileViT module is proposed to better capture the global information features in document images. - **Model Characteristics**: The model has a relatively small number of parameters, only one-fourth of similar models, and possesses good local and global feature learning capabilities. - **Experimental Results**: The model performs excellently on two standard datasets (DIBCO2012 and DIBCO2017) and also shows good robustness on the DIBCO2019 dataset. By introducing the MobileViT module, the model can effectively improve the performance of document image binarization while maintaining efficiency.

An Efficient Transformer–CNN Network for Document Image Binarization

DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

CTFCD: Channel Transformer Based on Full Convolutional Decoder for Single Image Deraining

A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

An Iterative Refinement Framework for Image Document Binarization with Bhattacharyya Similarity Measure

Document Image Binarization with Fully Convolutional Neural Networks

A Fair Evaluation of Various Deep Learning-Based Document Image Binarization Approaches

GDB: Gated convolutions-based Document Binarization

BiViT: Extremely Compressed Binary Vision Transformers

BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models

CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization

A Dynamic Network with Transformer for Image Denoising

Binarizing Documents by Leveraging both Space and Frequency

BiT: Robustly Binarized Multi-distilled Transformer

DBCvT: Double Branch Convolutional Transformer for Medical Image Classification

BiNet: Degraded-Manuscript Binarization in Diverse Document Textures and Layouts using Deep Encoder-Decoder Networks

DBCTNet: Double Branch Convolution-Transformer Network for Hyperspectral Image Classification

Deep Networks for Degraded Document Image Binarization through Pyramid Reconstruction

VisionTwinNet: Gated Clarity Enhancement Paired With Light-Robust CD Transformers

GSB: Group superposition binarization for vision transformer with limited training samples

An enhanced binarization framework for degraded historical document images