Abstract:The segmentation-free research efforts for addressing handwritten text recognition can be divided into three categories: connectionist temporal classification (CTC), hidden Markov model and encoder-decoder methods. In this paper, inspired by the above three modeling methods, we propose a new recognition network by using a novel three-dimensional (3D) attention module and global-local context information. Based on the feature maps of the last convolutional layer, a series of 3D blocks with different resolutions are split. Then, these 3D blocks are fed into the 3D attention module to generate sequential visual features. Finally, by integrating the visual features and the corresponding global-local context features, a well-designed representation can be obtained. Main canonical neural units including attention mechanisms, fully-connected layer, recurrent unit and convolutional layer are efficiently organized into a network and can be jointly trained by the CTC loss and the cross-entropy loss. Experiments on the latest Chinese handwritten text datasets (the SCUT-HCCDoc and the SCUT-EPT) and one English handwritten text dataset (the IAM) show that the proposed method can make a new milestone.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in Handwritten Text Recognition (HTR), specifically including: 1. **Segmentation - independent handwritten text recognition**: - Handwritten text recognition is a typical sequence - to - sequence problem and can be formulated as a Bayesian decision problem. Traditional methods usually rely on character detection boxes or additional training data, while segmentation - independent methods only need to be provided with text - level labels during the training phase. - The paper mainly focuses on three typical segmentation - independent methods: Hidden Markov Model (HMM), Connectionist Temporal Classification (CTC), and Encoder - Decoder (ED) framework. 2. **Limitations of existing methods**: - Although HMM can represent characters with high resolution, there are too many network output nodes for modeling state posterior probabilities, making it difficult to conduct end - to - end training, and the computational complexity of expanding 1D HMM to 2D HMM is high. - CTC and ED methods early on rely on the local receptive fields of convolutional layers or gradually reduce the height of feature maps to 1 pixel by stacking pooling layers, which may lead to information loss. 3. **Propose new solutions**: - To overcome the above problems, the paper proposes a new recognition network, using a novel 3D Attention Module and Global - Local Context Information. This method can explicitly extract two - dimensional information of feature blocks with different resolutions. - By introducing a multi - scale training strategy and combining CTC loss and Cross - Entropy Loss, the method proposed in the paper can achieve results comparable to the existing state - of - the - art methods on multiple datasets. ### Main contributions 1. **Improve the recognition network**: - Inspired by typical segmentation - independent methods, improve the text recognition network by introducing the 3D Attention Module and Global - Local Context Information. 2. **Effectively organize neural units**: - Carefully organize the main classic neural units, such as attention mechanisms, fully - connected layers, recurrent units, and convolutional layers, to form an efficient network structure. 3. **Multi - scale training strategy**: - Propose a multi - scale training method, including extracting 3D blocks with different resolutions and simultaneously using CTC loss and Cross - Entropy Loss for joint training. 4. **Experimental verification**: - Experiments were carried out on the latest Chinese handwritten text datasets (SCUT - HCCDoc and SCUT - EPT) and an English handwritten text dataset (IAM). The results show that the proposed method can achieve an effect comparable to the state - of - the - art methods, and a comprehensive analysis was carried out to verify the effects of the 3D Attention Module and different features. In summary, this paper is committed to improving the performance of handwritten text recognition. In particular, on the basis of segmentation - independent methods, by introducing new network structures and training strategies, it has solved the problems of information loss and difficulty in end - to - end training in existing methods.

Integrating Canonical Neural Units and Multi-Scale Training for Handwritten Text Recognition

A New Hybrid-Parameter Recurrent Neural Network for Online Handwritten Chinese Character Recognition

Recognition of Handwritten Chinese Text by Segmentation: A Segment-annotation-free Approach

An approach for handwritten Chinese text recognition unifying character segmentation and recognition

A Handwritten Chinese Text Recognizer Applying Multi-level Multimodal Fusion Network

Refocus attention span networks for handwriting line recognition

A novel connectionist system for unconstrained handwriting recognition

A Comprehensive Study of Hybrid Neural Network Hidden Markov Model for Offline Handwritten Chinese Text Recognition.

Fully Convolutional Recurrent Network for Handwritten Chinese Text Recognition

Handwritten Chinese Text Recognition by Integrating Multiple Contexts

Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition

A Residual-Attention Offline Handwritten Chinese Text Recognition Based on Fully Convolutional Neural Networks.

Fully Convolutional Networks for Handwriting Recognition

Deep Convolutional Neural Network Based Hidden Markov Model for Offline Handwritten Chinese Text Recognition

A New Hybrid-parameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

An Efficient End-to-End Neural Model for Handwritten Text Recognition

Intelligent character recognition using fully convolutional neural networks

Beyond Human Recognition: A CNN-based Framework for Handwritten Character Recognition

Robust Shared Feature Learning for Script and Handwritten/machine-Printed Identification

SegCTC: Offline Handwritten Chinese Text Recognition Via Better Fusion Between Explicit and Implicit Segmentation