Track, Attend, and Parse (TAP): an End-to-End Framework for Online Handwritten Mathematical Expression Recognition

Jianshu Zhang,Jun Du,Lirong Dai
DOI: https://doi.org/10.1109/tmm.2018.2844689
IF: 7.3
2019-01-01
IEEE Transactions on Multimedia
Abstract:In this paper, we introduce Track, Attend, and Parse (TAP), an end-to-end approach based on neural networks for online handwritten mathematical expression recognition (OHMER). The architecture of TAP consists of a tracker and a parser. The tracker employs a stack of bidirectional recurrent neural networks with gated recurrent units (GRU) to model the input handwritten traces, which can fully utilize the dynamic trajectory information in OHMER. Followed by the tracker, the parser adopts a GRU equipped with guided hybrid attention (GHA) to generate notations. The proposed GHA is composed of a coverage-based spatial attention, a temporal attention, and an attention guider. Moreover, we demonstrate the strong complementarity between offline information with static-image input and online information with ink-trajectory input by blending a fully convolutional networks-based watcher into TAP. Inherently, unlike traditional methods, this end-to-end framework does not require the explicit symbol segmentation and a predefined expression grammar for parsing. Validated on a benchmark published by the CROHME competition, the proposed approach outperforms the state-of-the-art methods and achieves the best reported results with an expression recognition accuracy of 61.16% on CROHME 2014 and 57.02% on CROHME 2016, using only official training dataset.
What problem does this paper attempt to address?