Vector Quantization Knowledge Transfer for End-to-End Text Image Machine Translation.

Cong Ma,Yaping Zhang,Yang Zhao,Yu Zhou,Chengqing Zong
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447334
2024-01-01
Abstract:End-to-end text image machine translation (TIMT) aims at translating source language embedded in images into target language without recognizing intermediate texts in images. However, the data scarcity of end-to-end TIMT task limits the translation performance. Existing research explores aligning continuous features from related tasks of text image recognition (TIR) or machine translation (MT) to alleviate the problem of data limitation, but the alignment in continuous vector space is extremely difficult and it inevitably introduces fitting errors resulting in significant performance degradation. To better align TIMT features with MT semantic features, we propose a novel Vector Quantization Knowledge Transfer (VQKT) method that employs a trainable codebook to quantize continuous features into discrete space. The quantization distribution of the MT feature is utilized as the teacher distribution to guide the TIMT model to generate similar discrete codes. Through alignment and knowledge transfer based on probability distribution, the TIMT model can better imitate the feature representation of the MT teacher model and generate high-quality target language translation. Extensive experiments demonstrate VQKT significantly outperforms the existing end-to-end TIMT performance.
What problem does this paper attempt to address?