Recognizing Multiple Text Sequences from an Image by Pure End-to-End Learning

Zhenlong Xu,Shuigeng Zhou,Fan Bai,Zhanzhan Cheng,Yi Niu,Shiliang Pu
DOI: https://doi.org/10.1109/icpr48806.2021.9412079
2021-01-01
Abstract:We address a challenging problem: recognizing multiple text sequences from an image by pure end-to-end learning. It is twofold: 1) Multiple text sequences recognition. Each image may contain multiple text sequences of different content, location and orientation, we try to recognize all these texts in the image. 2) Pure end-to-end (PEE) learning. We solve the problem in a pure end-to-end learning way where each training image is labeled by only text transcripts of the contained sequences, without any geometric annotations. Most existing works recognize multiple text sequences from an image in a non-end-to-end (NEE) or quasi-end-to-end (QEE) way, in which each image is trained with both text transcripts and text locations. Only recently, a PEE method was proposed to recognize text sequences from an image where the text sequence was split to several lines in the image. However, it cannot be directly applied to recognizing multiple text sequences from an image. So in this paper, we propose a pure end-to-end learning method to recognize multiple text sequences from an image. Our method directly learns the probability distribution of multiple sequences conditioned on each input image, and outputs multiple text transcripts with a well-designed decoding strategy. To evaluate the proposed method, we construct several datasets mainly based on an existing public dataset and two real application scenarios. Experimental results show that the proposed method can effectively recognize multiple text sequences from images, and outperforms CTC-based and attention-based baseline methods.
What problem does this paper attempt to address?