Viewing Writing As Video: Optical Flow Based Multi-Modal Handwritten Mathematical Expression Recognition

Hanbo Cheng,Jun Du,Pengfei Hu,Jiefeng Ma,Zhenrong Zhang,Mobai Xue
DOI: https://doi.org/10.1109/icassp48485.2024.10447346
2024-01-01
Abstract:Handwritten Mathematical Expression Recognition (HMER) forms a crucial task in the domain of document intelligence. It encompasses online and offline modalities, which utilize the trajectory sequence and static image as input, respectively. It is intuitive to utilize both online and offline modalities to build a more powerful recognition system. However, a formidable challenge arises as a result of the substantial heterogeneity between the online and offline modalities, which consequently leads to considerable obstacles in their alignment and fusion. In this work, we perceive the writing process as a video and introduce the Aggregated Optical Flow Map (AOFM) to represent the online modality, which is more compatible with the offline modality. Additionally, we propose the Optical Flow Aware Network (OFAN) in order to automatically extract, align, and fuse the features across online and offline modalities. Through experiment analysis, our method can be seamlessly applied to multiple existing offline HMER models, thereby yielding stable and substantial enhancements across CROHME 2014, 2016, and 2019 datasets. The code in this work is available at https: //github.com/Hanbo-Cheng/OFAN.git.
What problem does this paper attempt to address?