Abstract:Text detection only relying on local context is not robust to separate crowd texts accurately due to the challenges exist in texts dense layouts, inaccurate annotations and complex backgrounds. T‐Skeleton explicitly exploits long‐range dependencies to address the challenges by extracting text skeleton, which is the distances transformed between text pixels and their nearest boundary pixels. T‐Skeleton has great representation capacity and distinguish long and curved texts well. Existing segmentation‐based methods have made considerable progress in arbitrarily shaped text detection due to the advantage of dealing with shape variation. However, there still exist challenges to detecting accurate text instances with dense layouts, inaccurate annotations, and complex backgrounds. Many recent works have focused on improving arbitrary boundary prediction, but it may be difficult to accurately distinguish each instance of dense layouts because their boundary pixels may be mistakenly classified to produce inaccurate results (i.e., adhesive texts) with inaccurate annotation and complex backgrounds. Considering the local and long‐range dependencies, this paper proposes an efficient text detector, namely T‐Skeleton, to obtain more reliable segmentation detections. In the spirit of object skeletonization, we introduce the text instance skeleton highlighting the semantically significant structure (similar to the skeleton of a fish) to explicitly capture the long‐range dependencies of text instances. The key idea of T‐Skeleton is to calibrate the coarse text proposals by embedding text instance skeletons to separate crowd texts accurately and robustly. We further design a channel attention module to enlarge the performance margin between T‐Skeleton and the segmentation baseline. Experimental results on four publicly available datasets show the superiority of T‐Skeleton in handling long and curved texts.

Skeleton Matching based approach for Text Localization in Scene Images

T‐Skeleton: Accurate scene text detection via instance‐aware skeleton embedding

Video Text Localization with an emphasis on Edge Features

Text Region Identification in Indian Street Scene Images Using Stroke Width Transform and Support Vector Machine

Discrete Wavelet Transform and Gradient Difference based approach for text localization in videos

Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images

Automatic Text Location in Natural Scene Images

Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation

Text Detection Through Multiple-Scale Localization in Video Sequences

A New Multi-Modal Approach to Bib Number/text Detection and Recognition in Marathon Images

Real time text localization for Indoor Mobile Robot Navigation

TEXT DETECTION IN NATURAL SCENE IMAGES BY HIERARCHICAL LOCALIZATION AND GROWING OF TEXTUAL COMPONENTS

Local Gradient Difference Features for Classification of 2D-3D Natural Scene Text Images.

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Block-level Text Spotting with LLMs

Text Detection in Natural Images Using Localized Stroke Width Transform.

Offline Extraction of Indic Regional Language from Natural Scene Image using Text Segmentation and Deep Convolutional Sequence

A Novel Framework For Text Detection From Natural Scene Images With Complex Background

Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition

Mlts: A Multi-Language Scene Text Spotter

Visual Matching is Enough for Scene Text Retrieval.