Abstract:Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in Chinese text recognition: 1. **Complex character structures**: Chinese characters have complex structures and a large vocabulary. The traditional one - hot encoding method is difficult to fully represent the hierarchical structure of Chinese characters, such as radical and structural information, resulting in a large amount of loss in feature representation. 2. **Zero - sample recognition ability**: Due to the large and growing number of Chinese characters, existing models are difficult to achieve recognition of unseen characters (i.e., zero - sample recognition). For example, the latest Chinese standard GB18030 - 2022 contains 87,887 categories, far higher than the 27,533 categories in the GB18030 - 2000 standard. Therefore, the model needs to be able to recognize characters that have not appeared in the test set. 3. **Computational efficiency of model deployment**: The one - hot encoding method introduces a huge number of parameters in the classification layer. Especially when the number of character categories increases, the classification layer becomes extremely large and occupies most of the parameters of the model. This leads to significant challenges when deploying the model on devices with limited computing resources. To solve these problems, the author proposes HierCode, a lightweight hierarchical codebook. HierCode uses a multi - hot encoding strategy, hierarchical binary tree coding and prototype learning to create a unique and informative representation for each character. This method not only supports zero - sample recognition, but also can improve performance by calculating the similarity of visual features in line - level recognition tasks. Experimental results show that HierCode exhibits superior performance on multiple benchmark datasets, while having fewer model parameters and faster inference speed.

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Zero-shot Handwritten Chinese Character Recognition with hierarchical decomposition embedding

A Character Recognition Scheme Based on Object Oriented Design for Tibetan Buddhist Texts.

Zero-Shot Offline Handwritten Chinese Character Recognition with Graph Embedding.

Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition

Designing compact classifiers for rotation-free recognition of large vocabulary online handwritten Chinese characters

EMU: Effective Multi-Hot Encoding Net for Lightweight Scene Text Recognition with a Large Character Set.

Hippocampus-heuristic Character Recognition Network for Zero-shot Learning

Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning

Deep Learning-Driven Approach for Handwritten Chinese Character Classification

Exploring Better Text Image Translation with Multimodal Codebook

Open Set Chinese Character Recognition using Multi-typed Attributes

A Deep Neural Network for Chinese Zero Pronoun Resolution

PP-OCR: A Practical Ultra Lightweight OCR System

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

Toward Zero-shot Character Recognition: A Gold Standard Dataset with Radical-level Annotations

A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition

A Large Chinese Text Dataset in the Wild

Ultra Light OCR Competition Technical Report

An approach for handwritten Chinese text recognition unifying character segmentation and recognition

UniCode: Learning a Unified Codebook for Multimodal Large Language Models