Artistic-style text detector and a new Movie-Poster dataset

Aoxiang Ning,Yiting Wei,Minglong Xue,Senming Zhong
2024-06-24
Abstract:Although current text detection algorithms demonstrate effectiveness in general scenarios, their performance declines when confronted with artistic-style text featuring complex structures. This paper proposes a method that utilizes Criss-Cross Attention and residual dense block to address the incomplete and misdiagnosis of artistic-style text detection by current algorithms. Specifically, our method mainly consists of a feature extraction backbone, a feature enhancement network, a multi-scale feature fusion module, and a boundary discrimination module. The feature enhancement network significantly enhances the model's perceptual capabilities in complex environments by fusing horizontal and vertical contextual information, allowing it to capture detailed features overlooked in artistic-style text. We incorporate residual dense block into the Feature Pyramid Network to suppress the effect of background noise during feature fusion. Aiming to omit the complex post-processing, we explore a boundary discrimination module that guides the correct generation of boundary proposals. Furthermore, given that movie poster titles often use stylized art fonts, we collected a Movie-Poster dataset to address the scarcity of artistic-style text data. Extensive experiments demonstrate that our proposed method performs superiorly on the Movie-Poster dataset and produces excellent results on multiple benchmark datasets. The code and the Movie-Poster dataset will be available at: <a class="link-external link-https" href="https://github.com/biedaxiaohua/Artistic-style-text-detection" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue that current text detection algorithms perform poorly when dealing with artistic style texts with complex structures. Specifically, existing text detection algorithms tend to exhibit incomplete detection or misdiagnosis when faced with artistic style texts. Therefore, this paper proposes a new method that utilizes Criss-Cross Attention and Residual Dense Block to improve the detection of artistic style texts. ### Main Issues 1. **Incomplete Detection**: Existing algorithms tend to result in incomplete detection when dealing with artistic style texts due to their complex structures and background noise. 2. **Misdiagnosis**: Existing algorithms are prone to misidentifying non-text pixels as text pixels in complex environments. 3. **Data Scarcity**: There is a lack of sufficient artistic style text data in the market, which limits the training and performance improvement of models. ### Solutions 1. **Feature Enhancement Network**: Introduced Criss-Cross Attention and Residual Dense Block to enhance the model's perception in complex environments and capture detailed features of artistic style texts. 2. **Multi-Scale Feature Fusion Module**: Designed a Residual Feature Pyramid Network (R-FPN) to improve the effect of feature fusion by suppressing the influence of background noise. 3. **Boundary Discrimination Module**: Proposed a Boundary Discrimination Module (BDM) that combines prior information and feature map output to generate accurate boundary proposals, avoiding complex post-processing steps. 4. **Movie Poster Dataset**: Collected a Movie-Poster dataset containing 1500 movie posters to supplement the existing artistic style text data. ### Experimental Results - **Performance Improvement**: On the Movie-Poster dataset, the proposed method significantly improved the F-measure, reaching 87.34%. It also performed well on multiple benchmark datasets. - **Module Effectiveness**: Ablation experiments verified the effectiveness of each module. The combination of the RCCA module and the R-FPN module significantly improved detection performance, especially when dealing with artistic style texts. ### Summary This paper effectively addresses the issues of incomplete detection and misdiagnosis in existing text detection algorithms when processing artistic style texts by proposing new feature enhancement networks, multi-scale feature fusion modules, and boundary discrimination modules. The effectiveness of the method is further validated by collecting a new dataset.