Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition

Zhao Zhang,Zemin Tang,Zheng Zhang,Yang Wang,Jie Qin,Meng Wang
DOI: https://doi.org/10.48550/arXiv.1912.06446
2020-01-15
Abstract:The Deep Convolutional Neural Networks (CNNs) have obtained a great success for pattern recognition, such as recognizing the texts in images. But existing CNNs based frameworks still have several drawbacks: 1) the traditaional pooling operation may lose important feature information and is unlearnable; 2) the tradi-tional convolution operation optimizes slowly and the hierar-chical features from different layers are not fully utilized. In this work, we address these problems by developing a novel deep network model called Fully-Convolutional Intensive Feature Flow Neural Network (IntensiveNet). Specifically, we design a further dense block called intensive block to extract the feature information, where the original inputs and two dense blocks are connected tightly. To encode data appropriately, we present the concepts of dense fusion block and further dense fusion opera-tions for our new intensive block. By adding short connections to different layers, the feature flow and coupling between layers are enhanced. We also replace the traditional convolution by depthwise separable convolution to make the operation efficient. To prevent important feature information being lost to a certain extent, we use a convolution operation with stride 2 to replace the original pooling operation in the customary transition layers. The recognition results on large-scale Chinese string and MNIST datasets show that our IntensiveNet can deliver enhanced recog-nition results, compared with other related deep models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is some deficiencies in the existing text recognition frameworks based on deep convolutional neural networks (CNNs). Specifically, these deficiencies include: 1. **Defects of traditional pooling operations**: Traditional pooling operations may lose important feature information and are unlearnable. This may lead to a decline in the performance of the model when processing images with complex backgrounds and contents. 2. **Inefficiency of traditional convolution operations**: Traditional convolution operations are slow to optimize, and the hierarchical features from different levels are not fully utilized. This limits the learning efficiency and feature extraction ability of the model. To solve these problems, the author proposes a new deep network model - **Fully - Convolutional Intensive Feature Flow Neural Network (IntensiveNet)**. This model enhances the feature flow and inter - layer coupling by introducing intensive blocks, dense fusion blocks, and further dense fusion operations. In addition, the author replaces the traditional pooling operation with a convolution operation with a stride of 2 to prevent the loss of important feature information and uses depthwise separable convolution to improve the computational efficiency of the model. ### Specific improvement points 1. **Intensive Block**: - Two dense blocks are introduced, and the input features are closely connected to the dense blocks through short connections, thereby enhancing feature flow and inter - layer coupling. - The dense fusion block and further dense fusion operations are proposed to enhance the feature representation learning ability. 2. **Replace pooling operations**: - Use a convolution operation with a stride of 2 instead of the traditional pooling operation to reduce the loss of feature information and make the parameters of the entire framework learnable. 3. **Improve model efficiency**: - Use depthwise separable convolution instead of the standard convolution operation to reduce the computational cost and maintain similar performance. Through these improvements, the experimental results of IntensiveNet on large - scale Chinese character strings and the MNIST data set show that this model can provide better recognition results than other related deep models. ### Summary This paper aims to solve the problems of feature information loss and low computational efficiency in the existing text recognition frameworks by designing a brand - new convolutional neural network structure, thereby improving the accuracy and efficiency of text recognition.