HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Yuyi Zhang,Yuanzhi Zhu,Dezhi Peng,Peirong Zhang,Zhenhua Yang,Zhibo Yang,Cong Yao,Lianwen Jin
2024-03-21
Abstract:Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve several key problems in Chinese text recognition: 1. **Complex character structures**: Chinese characters have complex structures and a large vocabulary. The traditional one - hot encoding method is difficult to fully represent the hierarchical structure of Chinese characters, such as radical and structural information, resulting in a large amount of loss in feature representation. 2. **Zero - sample recognition ability**: Due to the large and growing number of Chinese characters, existing models are difficult to achieve recognition of unseen characters (i.e., zero - sample recognition). For example, the latest Chinese standard GB18030 - 2022 contains 87,887 categories, far higher than the 27,533 categories in the GB18030 - 2000 standard. Therefore, the model needs to be able to recognize characters that have not appeared in the test set. 3. **Computational efficiency of model deployment**: The one - hot encoding method introduces a huge number of parameters in the classification layer. Especially when the number of character categories increases, the classification layer becomes extremely large and occupies most of the parameters of the model. This leads to significant challenges when deploying the model on devices with limited computing resources. To solve these problems, the author proposes HierCode, a lightweight hierarchical codebook. HierCode uses a multi - hot encoding strategy, hierarchical binary tree coding and prototype learning to create a unique and informative representation for each character. This method not only supports zero - sample recognition, but also can improve performance by calculating the similarity of visual features in line - level recognition tasks. Experimental results show that HierCode exhibits superior performance on multiple benchmark datasets, while having fewer model parameters and faster inference speed.