Conditional Sentence Generation and Cross-modal Reranking for Sign Language Translation

Jian Zhao,Weizhen Qi,Wengang Zhou,Duan Nan,Ming Zhou,Houqiang Li
DOI: https://doi.org/10.1109/tmm.2021.3087006
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Sign Language Translation (SLT) aims to generate spoken language translations from sign language videos. Currently, the available sign language datasets are relatively too small to learn the linguistic properties of spoken language. In this paper, towards effective SLT, we propose a novel framework which takes the advantage of the spoken language grammar learnt from a large corpus of text sentences. Our framework consists of three key modules: word existence verification, conditional sentence generation and cross-modal re-ranking. We first check the existence of words in the vocabulary by a series of binary classification in parallel. After that, the appearing words are assembled and guided by a pretrained spoken language generator to produce multiple candidate sentences in spoken language manner. Last but not least, we select the sentence most semantically similar to the input sign video as the translation result with a crossmodal re-ranking model. We evaluate our framework on two large scale continuous SLT benchmarks, i.e., CSL and RWTHPHOENIX-Weather 2014 T. Experimental results demonstrate that the proposed framework achieves promising performance on both datasets.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?