HENet: Hyperbolic-Based Encoder-Decoder Network for Word Spotting in Historical Mongolian Documents
Jing Zhang,Hongxi Wei,Qing Zhang,Xiandong Chen,Jingtao Ma
DOI: https://doi.org/10.1109/icassp48485.2024.10446514
2024-01-01
Abstract:In the domain of historical Mongolian document image retrieval (HMDIR), word spotting poses a inherent challenge due to the frequent appearance of out-of-vocabulary (OOV) words. Existing methods have mainly focused on query-by-example (QBE), neglecting the query-by-string (QBS) approach. Meanwhile, the hierarchical structure of word makes Euclidean space not the optimal choice for representing complex structured data. To address the aforementioned problems, we propose a novel method that leverages a shared hyperbolic space to effectively align text strings and word images. Specifically, we use the Pyramidal Histogram of Characters (PHOC) for text string embeding, and a robust encoder-decoder architecture for word image embedding, then map their embeddings in the shared hyperbolic space. Moreover, we propose a new dataset of historical Mongolian documents called Geser, which includes 143,508 word images and 10,951 vocabularies. Extensive experiments conducted on two datasets of historical Mongolian documents with an OOV partitioning scheme (Kanjur and Geser), demonstrate that our proposed method surpasses state-of-the-art methods and achieves outstanding results on Geser.