Imbalanced Conditional Conv-Transformer for Mathematical Expression Recognition

Shuaijian Ji,Zhaokun Zhou,Yuqing Wang,Baishan Duan,Zhenyu Weng,Liang Xu,Yuesheng Zhu
DOI: https://doi.org/10.1007/978-3-031-44223-0_36
2023-01-01
Abstract:Mathematical Expression Recognition (MER), which aims to convert images into corresponding LaTeX markup, has been a long-standing research topic. Previous methods employ the paradigm of dense computing in both encoder and decoder would suffer from slow convergence, limited performance, and design complexity of extra backbone network before encoder. To alleviate the above limitation, we propose a fast-converging end-to-end ImBalanced Conditional Conv-Transformer (IBCCT) architecture that combines a light encoder and a heavy decoder. Besides, we extend the traditional encoder-decoder framework by further learning a lightweight network to generate for each image a conditional token to inject global position information. Extensive experiments show that our IBCCT-Base model can achieve better performance with faster convergence speed and the parameter is reduced by 33% compared with the SOTA method. In particular, our IBCCT-Large model has achieved 94.04% and 93.2% in the Match metric, which is 1.34% and 4.12% higher than the SOTA method.
What problem does this paper attempt to address?