A Multi-Scale Bimodal Fusion Network for Robust and Accurate Online Handwriting Recognition.

Zhen Xu,Ziqiang Chen,Yaqiang Wu,Hui Li,Wanjun Lv,Lianwen Jin,Qianying Wang
DOI: https://doi.org/10.1109/ICASSP48485.2024.10446390
2024-01-01
Abstract:Online handwriting recognition based on sensor trajectory information faces several unresolved challenges: 1) sensor signals lack sufficient global spatial context; 2) different recognition tasks have inconsistent requirements for feature receptive fields. This is due to the inconsistent scales of the input sequences and the different semantic complexity of different language units. In this paper, we propose an online handwritten text recognition method based on multi-scale bimodal feature fusion to address these challenges. First, we employ sequence-generated pseudo-images to supplement the two-dimensional spatial information, and then extract multi-scale features from both trajectories and images simultaneously. Subsequently, our designed bimodal embedding learning module jointly learns feature embeddings for trajectories and images at different scales. These embeddings are then fed into a novel position-aware multi-scale fusion module to extract features for text prediction. The proposed modules effectively mitigate the issues of scales and semantics misalignment. Experimental results demonstrate significant performance improvements on various handwriting recognition datasets using our approach.
What problem does this paper attempt to address?