Irregular Text Block Recognition Via Decoupling Visual, Linguistic, and Positional Information
Ziyan Li,Lianwen Jin,Chengquan Zhang,Jiaxin Zhang,Zecheng Xie,Pengyuan Lyu,Kun Yao
DOI: https://doi.org/10.1016/j.patcog.2024.110516
IF: 8
2024-01-01
Pattern Recognition
Abstract:Scene text recognition has made great progress in regular formats, and the recent research has focused on irregular text recognition. In this work, we investigate a new challenge problem of recognizing a text block instance with irregular arrangement of characters, which is referred to as irregular text block recognition (ITBR). This problem is prevalent in daily scenarios, especially with the increasing use of rich text designs in signboards, logos, posters, and other mediums. The primary challenge arises from the weakened position clues and the highly complex reading order, which can often only be deciphered by a heavy reliance on understanding the intrinsic linguistic information. Hence, conventional recognition methods that employ inflexible character grouping rules, coupled with positional information, or constrained by vocabulary reliance, may struggle with the ITBR problem. To this end, we propose a progressive layout reasoning network (PLRN) to recognize the irregular text block by decoupling visual, linguistic, and positional information. PLRN comprises a character spotting module that recognizes the character set based solely on visual features with a new TopK-rank decoding mechanism, and a linkage reasoning module to interpret the character relationships within this set with a progressive refinement strategy. The linkages are initially reasoned by linguistic information and then progressively refined through the incorporation of proximity and tendency information, allowing for explicit decoupling and improved reasoning accuracy. To assess the effectiveness of the proposed method, we construct a new dataset called TextBlock600. This dataset consists of 600 images of irregular text blocks, each with complete sequence annotations. Experimental results demonstrate that PLRN shows promising performance in ITBR, opening up possibilities for further research in this field. Code and datasets will be released.