Abstract:The exploration of linguistic information promotes the development of scene text recognition task. Benefiting from the significance in parallel reasoning and global relationship capture, transformer-based language model (TLM) has achieved dominant performance recently. As a decoupled structure from the recognition process, we argue that TLM's capability is limited by the input low-quality visual prediction. To be specific: 1) The visual prediction with low character-wise accuracy increases the correction burden of TLM. 2) The inconsistent word length between visual prediction and original image provides a wrong language modeling guidance in TLM. In this paper, we propose a Progressive scEne Text Recognizer (PETR) to improve the capability of transformer-based language model by handling above two problems. Firstly, a Destruction Learning Module (DLM) is proposed to consider the linguistic information in the visual context. DLM introduces the recognition of destructed images with disordered patches in the training stage. Through guiding the vision model to restore patch orders and make word-level prediction on the destructed images, visual prediction with high character-wise accuracy is obtained by exploring inner relationship between the local visual patches. Secondly, a new Language Rectification Module (LRM) is proposed to optimize the word length for language guidance rectification. Through progressively implementing LRM in different language modeling steps, a novel progressive rectification network is constructed to handle some extremely challenging cases (e.g. distortion, occlusion, etc.). By utilizing DLM and LRM, PETR enhances the capability of transformer-based language model from a more general aspect, that is, focusing on the reduction of correction burden and rectification of language modeling guidance. Compared with parallel transformer-based methods, PETR obtains 1.0% and 0.8% imp- ovement on regular and irregular datasets respectively while introducing only 1.7M additional parameters. The extensive experiments on both English and Chinese benchmarks demonstrate that PETR achieves the state-of-the-art results.

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Turkish Delights: a Dataset on Turkish Euphemisms

TEDB System Description to a Shared Task on Euphemism Detection 2022

A Report on the Euphemisms Detection Shared Task

Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation

Exploring Euphemism Detection in Few-Shot and Zero-Shot Settings

PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

PET: Parameter-efficient Knowledge Distillation on Transformer

Euphemistic Phrase Detection by Masked Language Model

Impromptu Cybercrime Euphemism Detection

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Transformer-based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria

Application of the transformer model algorithm in chinese word sense disambiguation: a case study in chinese language

Dawn of the transformer era in speech emotion recognition: closing the valence gap