Arbitrary-Shaped Text Detection withAdaptive Text Region Representation

Xiufeng Jiang,Shugong Xu,Shunqing Zhang,Shan Cao
DOI: https://doi.org/10.1109/ACCESS.2020.2999069
2021-04-01
Abstract:Text detection/localization, as an important task in computer vision, has witnessed substantialadvancements in methodology and performance with convolutional neural networks. However, the vastmajority of popular methods use rectangles or quadrangles to describe text regions. These representationshave inherent drawbacks, especially relating to dense adjacent text and loose regional text boundaries,which usually cause difficulty detecting arbitrarily shaped text. In this paper, we propose a novel text regionrepresentation method, with a robust pipeline, which can precisely detect dense adjacent text instances witharbitrary shapes. We consider a text instance to be composed of an adaptive central text region mask anda corresponding expanding ratio between the central text region and the full text region. More specifically,our pipeline generates adaptive central text regions and corresponding expanding ratios with a proposedtraining strategy, followed by a new proposed post-processing algorithm which expands central text regionsto the complete text instance with the corresponding expanding ratios. We demonstrated that our new textregion representation is effective, and that the pipeline can precisely detect closely adjacent text instances ofarbitrary shapes. Experimental results on common datasets demonstrate superior performance o
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the difficulties encountered by existing text detection methods when dealing with arbitrarily - shaped text (especially densely adjacent text). Specifically: 1. **Limitations of rectangular or quadrilateral representations**: Most existing text detection methods use rectangles or quadrilaterals to describe text regions, which have inherent flaws when dealing with curved text or multi - directional text. For example, these representations may include additional non - text information, leading to inaccurate detection. 2. **Limitations of segmentation mask representations**: Although segmentation masks can well represent arbitrarily - shaped text, in natural scenes, there may be small gaps between text instances, making it difficult to separate densely adjacent text instances. To solve these problems, the author proposes a new text region representation method, which combines the central text region map and the expansion ratio to more accurately detect arbitrarily - shaped text instances, especially densely adjacent text. Specifically, the author's proposed solution includes the following aspects: - **Central text region map**: Used to represent the core area of the text, with a shape similar to the original text. - **Expansion ratio**: Used to expand from the central text region to the complete text region. Through this method, the author aims to overcome the shortcomings of existing methods in dealing with complex text shapes and dense text arrangements, thereby achieving more accurate text detection. ### Formula Representation To express this process more clearly, we can use the following formulas to represent the expansion of the text region: Let \( p_i \) be a boundary point of the central text region, \( q_i \) be the corresponding boundary point of the expanded complete text region, and \( d \) be the expansion ratio. Then the expansion vector \( \overrightarrow{p_i q_i} \) can be expressed as: \[ \overrightarrow{p_i q_i} = \left( \frac{d}{\sin(\theta)} \right) \cdot \text{Norm}(\overrightarrow{p_i q_i}) \] where \( \theta \) is the angle formed by two adjacent boundary point vectors \( \vec{v_1} \) and \( \vec{v_2} \), and is calculated as follows: \[ \sin(\theta) = \frac{|\vec{v_1} \times \vec{v_2}|}{|\vec{v_1}| \cdot |\vec{v_2}|} \] And the unit vector \( \text{Norm}(\overrightarrow{p_i q_i}) \) can be expressed as: \[ \text{Norm}(\overrightarrow{p_i q_i}) = \frac{\vec{v_1} + \vec{v_2}}{|\vec{v_1}| + |\vec{v_2}|} \] Finally, the complete text region point \( q_i \) can be calculated by the following formula: \[ q_i = p_i + \overrightarrow{p_i q_i} \] Through this method, the author can effectively expand the central text region to the complete text region, thereby achieving accurate detection of arbitrarily - shaped text.