SignGraph: A Sign Sequence is Worth Graphs of Nodes

Shiwei Gan,Yafeng Yin,Zhiwei Jiang,Hongkai Wen,Lei Xie,Sanglu Lu
DOI: https://doi.org/10.1109/cvpr52733.2024.01279
2024-01-01
Abstract:Despite the recent success of sign language research, the widely adopted CNN-based backbones are mainly migrated from other computer vision tasks, in which the contours and texture of objects are crucial for identifying objects. They usually treat sign frames as grids and may fail to capture effective cross-region features. In fact, sign language tasks need to focus on the correlation of different regions in one frame and the interaction of different regions among adjacent frames for identifying a sign sequence. In this paper, we propose to represent a sign sequence as graphs and introduce a simple yet effective graph-based sign language processing architecture named SignGraph, to extract crossregion features at the graph level. SignGraph consists of two basic modules: Local Sign Graph (LSG) module for learning the correlation of intra-frame cross-region features in one frame and Temporal Sign Graph (TSG) module for tracking the interaction of inter-frame cross-region features among adjacent frames. With LSG and TSG, we build our model in a multiscale manner to ensure that the representation of nodes can capture cross-region features at different granularities. Extensive experiments on current public sign language datasets demonstrate the superiority of our SignGraph model. Our model achieves very competitive performances with the SOTA model, while not using any extra cues. Code and models are available at: https://github.com/gswycf/SignGraph.
What problem does this paper attempt to address?