Multi-modal Sign Language Spotting by Multi/One-Shot Learning.

Landong Liu,Wengang Zhou,Weichao Zhao,Hezhen Hu,Houqiang Li
DOI: https://doi.org/10.1007/978-3-031-25085-9_15
2022-01-01
Abstract:The sign spotting task aims to identify whether and where an isolated sign of interest exists in a continuous sign language video. Recently, it has received substantial attention since it is a promising tool to annotate large-scale sign language data. Previous methods utilized multiple sources of available supervision information to localize the sign actions under the RGB domain. However, these methods overlook the complementary nature of different modalities, i.e., RGB, optical flow, and pose, which are beneficial to the sign spotting task. To this end, we propose a framework to merge multiple modalities for multiple-shot supervised learning. Furthermore, we explore the sign spotting task with the one-shot setting, which needs fewer annotations and has broader applications. To evaluate our approach, we participated in the Sign Spotting Challenge, organized by ECCV 2022. The competition contains two tracks, i.e., multiple-shot supervised learning (MSSL for track 1) and one-shot learning with weak labels (OSLWL for track 2). In track 1, our method achieves around 0.566 F1-score and is ranked 2nd. In track 2, we are ranked the 1st, with a 0.6 F1-score. These results demonstrate the effectiveness of our proposed method. We hope our solution will provide some insight for future research in the community.
What problem does this paper attempt to address?