Character-Level Street View Text Spotting Based on Deep Multisegmentation Network for Smarter Autonomous Driving

Chongsheng Zhang,Yuefeng Tao,Kai Du,Weiping Ding,Bin Wang,Ji Liu,Wei Wang
DOI: https://doi.org/10.1109/tai.2021.3116216
2021-01-01
IEEE Transactions on Artificial Intelligence
Abstract:Urban scenes are full of street entities with sign boards. Therefore, in autonomous driving, street view text spotting techniques will play a significant role in the precise understanding of surrounding scenes during driving, because texts contained in the images usually provide important clues for accurate image understanding, while it is often ambiguous for existing computer vision algorithms to understand scene images without texts. In this work, we propose a Multi-Segmentation network for character-level scene Text Detection (MSTD). The MSTD introduces a densely connected atrous spatial pyramid pooling module to enlarge the receptive field of the feature extraction layer, so as to localize long as well as large-sized text instances. Moreover, it devises a double segmentation subnetwork to utilize two independent but inherently complementary losses to co-optimize the network and increase the reliability of the confidence scores in predicting the text/nontext areas. With the character instances detected by the MSTD, one can easily perform scene text spotting with classic object recognition networks such as ResNet and DenseNet. We carried out extensive experiments on nine scene text datasets to demonstrate the outstanding performance of the MSTD on character-level and line-level text instance localization and scene text recognition, where the MSTD significantly outperforms the state-of-the-art scene text detection methods and the sequence-to-sequence-learning-based scene text recognizers.
What problem does this paper attempt to address?