Abstract:Text detection/localization, as an important task in computer vision, has witnessed substantialadvancements in methodology and performance with convolutional neural networks. However, the vastmajority of popular methods use rectangles or quadrangles to describe text regions. These representationshave inherent drawbacks, especially relating to dense adjacent text and loose regional text boundaries,which usually cause difficulty detecting arbitrarily shaped text. In this paper, we propose a novel text regionrepresentation method, with a robust pipeline, which can precisely detect dense adjacent text instances witharbitrary shapes. We consider a text instance to be composed of an adaptive central text region mask anda corresponding expanding ratio between the central text region and the full text region. More specifically,our pipeline generates adaptive central text regions and corresponding expanding ratios with a proposedtraining strategy, followed by a new proposed post-processing algorithm which expands central text regionsto the complete text instance with the corresponding expanding ratios. We demonstrated that our new textregion representation is effective, and that the pipeline can precisely detect closely adjacent text instances ofarbitrary shapes. Experimental results on common datasets demonstrate superior performance o

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the difficulties encountered by existing text detection methods when dealing with arbitrarily - shaped text (especially densely adjacent text). Specifically: 1. **Limitations of rectangular or quadrilateral representations**: Most existing text detection methods use rectangles or quadrilaterals to describe text regions, which have inherent flaws when dealing with curved text or multi - directional text. For example, these representations may include additional non - text information, leading to inaccurate detection. 2. **Limitations of segmentation mask representations**: Although segmentation masks can well represent arbitrarily - shaped text, in natural scenes, there may be small gaps between text instances, making it difficult to separate densely adjacent text instances. To solve these problems, the author proposes a new text region representation method, which combines the central text region map and the expansion ratio to more accurately detect arbitrarily - shaped text instances, especially densely adjacent text. Specifically, the author's proposed solution includes the following aspects: - **Central text region map**: Used to represent the core area of the text, with a shape similar to the original text. - **Expansion ratio**: Used to expand from the central text region to the complete text region. Through this method, the author aims to overcome the shortcomings of existing methods in dealing with complex text shapes and dense text arrangements, thereby achieving more accurate text detection. ### Formula Representation To express this process more clearly, we can use the following formulas to represent the expansion of the text region: Let \( p_i \) be a boundary point of the central text region, \( q_i \) be the corresponding boundary point of the expanded complete text region, and \( d \) be the expansion ratio. Then the expansion vector \( \overrightarrow{p_i q_i} \) can be expressed as: \[ \overrightarrow{p_i q_i} = \left( \frac{d}{\sin(\theta)} \right) \cdot \text{Norm}(\overrightarrow{p_i q_i}) \] where \( \theta \) is the angle formed by two adjacent boundary point vectors \( \vec{v_1} \) and \( \vec{v_2} \), and is calculated as follows: \[ \sin(\theta) = \frac{|\vec{v_1} \times \vec{v_2}|}{|\vec{v_1}| \cdot |\vec{v_2}|} \] And the unit vector \( \text{Norm}(\overrightarrow{p_i q_i}) \) can be expressed as: \[ \text{Norm}(\overrightarrow{p_i q_i}) = \frac{\vec{v_1} + \vec{v_2}}{|\vec{v_1}| + |\vec{v_2}|} \] Finally, the complete text region point \( q_i \) can be calculated by the following formula: \[ q_i = p_i + \overrightarrow{p_i q_i} \] Through this method, the author can effectively expand the central text region to the complete text region, thereby achieving accurate detection of arbitrarily - shaped text.

Arbitrary-Shaped Text Detection withAdaptive Text Region Representation

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Arbitrary-shaped Scene Text Detection with Keypoint-Based Shape Representation

Region-aware Arbitrary-shaped Text Detection with Progressive Fusion

Arbitrary-shaped scene text detection by predicting distance map

Arbitrary-shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information

Arbitrarily Shaped Scene Text Detection with Dynamic Convolution

Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text Detection

Arbitrary Shape Text Detection via Segmentation With Probability Maps

Arbitrary Shape Scene Text Detector with Accurate Text Instance Generation Based on Instance-Relevant Contexts

Arbitrary-Shaped Text Detection with Watershed Segmentation Network

Which and Where to Focus: A Simple yet Accurate Framework for Arbitrary-Shaped Nearby Text Detection in Scene Images

Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint

Bidirectional Regression for Arbitrary-Shaped Text Detection

Arbitrary Shape Text Detection via Boundary Transformer

Text Kernel Calculation for Arbitrary Shape Text Detection

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection

CM-Net: Concentric Mask Based Arbitrary-Shaped Text Detection

CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes