Contrastive Disentangled Meta-Learning for Signer-Independent Sign Language Translation.

Tao Jin,Zhou Zhao
DOI: https://doi.org/10.1145/3474085.3475456
2021-01-01
Abstract:Sign language translation aims at directly translating a sign language video into a natural sentence. The majority of existing methods take the video-sentence pairs labeled by multiple specific signers as training and testing samples. However, such setting does not fit in with the real-world applications. A practicable sign language translation system is supposed to provide accurate translation results for unseen signers. In this paper, we mainly attack the signer-independent setting and focus on augmenting the generalization ability of translation model. To adapt to the challenging setting, we propose a novel framework called contrastive disentangled meta-learning (CDM), which develops several improvements in both deep architecture and training mode. Specifically, based on the minimax entropy objective, a disentangled module with adaptive gated units is developed to decouple the signer-specific and task-specific representation in the encoder. Besides, we facilitate the frame-word alignments by leveraging contrastive constraints between the obtained task-specific representation and the decoding output. The disentangled and contrastive modules could provide complementary information for each other. As for the training mode, we encourage the model to perform well in the simulated signer-independent scenarios by finding the generalized learning directions in the meta-learning process. Considering that vanilla meta-learning methods utilize the multiple specific signers insufficiently, we adopt a fine-grained learning strategy that simultaneously conducts meta-learning in a variety of domain shift scenarios in each iteration. Extensive experiments on the benchmark dataset RWTH-PHOENIX-Weather-2014T(PHOENIX14T) show that CDM could achieve competitive results compared with the state-of-the-art methods.
What problem does this paper attempt to address?