Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel

Zhihao Wang,Yanwei Yu,Yibo Wang,Haixu Long,Fazheng Wang
DOI: https://doi.org/10.48550/arXiv.2107.01547
2021-07-04
Abstract:Offline Chinese handwriting text recognition is a long-standing research topic in the field of pattern recognition. In previous studies, text detection and recognition are separated, which leads to the fact that text recognition is highly dependent on the detection results. In this paper, we propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recognition with text kernel that integrates global text feature information to optimize the recognition from multiple scales, which reduces the dependence of detection and improves the robustness of the system. Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset. Without any language model, the correct rates are 99.12% and 94.27% for line-level recognition, and 99.03% and 94.20% for page-level recognition, respectively.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several major challenges in offline Chinese handwritten text recognition, specifically including: 1. **Character Diversity**: Chinese handwritten fonts contain a large number of characters, which increases the difficulty of recognition. 2. **Writing Style Diversity**: The writing styles of different people vary greatly, ranging from sloppy to neat, which poses higher requirements for recognition algorithms. 3. **Character Cursive - writing Problem**: In handwritten texts, there are often cursive - writing phenomena between characters, resulting in blurred character boundaries and further increasing the difficulty of recognition. To solve these problems, the paper proposes a new end - to - end Chinese text page recognition framework. The main features of this framework are as follows: - **Unified Detection and Recognition**: Traditional Chinese handwritten text recognition methods usually perform text detection and recognition separately, which makes the recognition results highly dependent on the detection results. The framework proposed in this paper unifies detection and recognition by introducing Text Kernel, uses global text feature information to optimize the recognition process, reduces the dependence on detection results, and improves the robustness of the system. - **Multi - scale Information Fusion**: Improve the accuracy of recognition through multi - scale information fusion. Specifically, the Temporal Convolutional Network (TCN) and Self - attention mechanism are used to extract multi - scale text information. - **Text Kernel Segmentation**: Through text kernel segmentation, the feature information of text lines is concentrated in the core area, so that the text content can be correctly recognized even if the detection frame is not completely accurate. - **Centerline Alignment**: Use the centerline alignment technique to transform irregular text lines into rectangles through Thin - Plate Spline (TPS) transformation, thereby improving the accuracy of recognition. The experimental results show that this method has achieved state - of - the - art performance on the CASIA - HWDB2.0 - 2.2 data set and the ICDAR - 2013 competition data set, with the correct rates reaching 99.12% and 94.27% at the line - level and page - level recognition respectively, and also performs excellently without a language model. These results verify the effectiveness and robustness of the proposed method.