Heterogeneous Attention Based Transformer for Sign Language Translation

Hao Zhang,Yixiang Sun,Zenghui Liu,Qiyuan Liu,Xiyao Liu,Ming Jiang,Gerald Schafer,Hui Fang
DOI: https://doi.org/10.1016/j.asoc.2023.110526
2022-01-01
SSRN Electronic Journal
Abstract:Sign language translation (SLT) has attracted significant interest both from research and industry, enabling convenient communications with the deaf-mute community. While recent transformer-based models have shown improved sign translation performance, it is still under-explored how to design an efficient transformer-based deep network architecture that effectively extracts joint visual-text features by exploiting multi-level spatial and temporal contextual information. In this paper, we propose heterogeneous attention based transformer(HAT), a novel SLT model to generate attentions from diverse spatial and temporal contextual levels. Specifically, the proposed light dual-stream sparse attention-based module yields more effective visual-text representations compared to conventional transformers. Extensive experiments demonstrate that our HAT achieves state-of-the-art performance on the challenging PHOENIX2014T benchmark dataset with a BLEU-4 score of 25.33 on the test set.
What problem does this paper attempt to address?