Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing

Yan Shu,Weichao Zeng,Zhenhang Li,Fangmin Zhao,Yu Zhou
2024-02-05
Abstract:Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that distinguish text from general objects. Effectively leveraging these unique textual characteristics is crucial in visual text processing, as observed in our study. In this survey, we present a comprehensive, multi-perspective analysis of recent advancements in this field. Initially, we introduce a hierarchical taxonomy encompassing areas ranging from text image enhancement and restoration to text image manipulation, followed by different learning paradigms. Subsequently, we conduct an in-depth discussion of how specific textual features such as structure, stroke, semantics, style, and spatial context are seamlessly integrated into various tasks. Furthermore, we explore available public datasets and benchmark the reviewed methods on several widely-used datasets. Finally, we identify principal challenges and potential avenues for future research. Our aim is to establish this survey as a fundamental resource, fostering continued exploration and innovation in the dynamic area of visual text processing.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of a comprehensive review in the field of visual text processing. Although there have been many survey studies on text detection and recognition, these studies mainly focus on text localization and lack a comprehensive analysis of visual text processing. Visual text processing includes two major categories: text - image enhancement / restoration and text - image manipulation, involving multiple tasks from improving text quality in low - resolution images to deleting, editing, and generating text images. The paper aims to fill this gap by providing a multi - level classification system that covers different tasks and learning paradigms and deeply explores specific text features (such as structure, stroke, semantic, style, and spatial context). In addition, the paper also evaluates existing datasets and benchmark testing methods and points out the challenges in current research and future research directions.