Abstract:Abstract. In this era, intelligent vision computing has always been a fascinating field. With the rapid development in computer vision, dynamic gesture-based recognition systems have attracted significant attention. However, automatically recognizing skeleton-based human gestures in the form of sign language is complex and challenging. Most existing methods consider skeleton-based human gesture recognition as a standard video recognition problem, without considering the rich structure information among both joints and gesture frames. Graph convolutional networks (GCNs) are a promising way to leverage structure information to learn structure representations. However, adopting GCNs to tackle such gesture sequences both in spatial and temporal spaces is challenging as graph could be highly nonlinear and complex. To overcome this issue, we propose the spatiotemporal GCNs model to leverage the powerful spatiotemporal correlations to adaptively construct spatiotemporal graphs, called Aegles. Our method could dynamically attend to relatively significant spatiotemporal joints and construct different graphs, including spatial, temporal, and spatiotemporal graph, and well capturing the structure information in gesture sequences. Besides, we introduce the second-order information of the gesture skeleton data, i.e., the length and orientation of bones, to improve the representation of human hands and fingers. In addition, with the public sign language datasets, we use OpenPose technology to extract human gesture skeleton and obtain human skeleton video, building four skeleton-based sign language recognition datasets. Experimental results show that this Aegles outperforms the state-of-the-art ones and that the spatiotemporal correlations effectively boost the performance of human gesture recognition.

Spatio-Temporal Dynamic Attention Graph Convolutional Network Based on Skeleton Gesture Recognition

Temporal Decoupling Graph Convolutional Network for Skeleton-Based Gesture Recognition

Spatial-Temporal Attention Res-TCN for Skeleton-Based Dynamic Hand Gesture Recognition

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Human gesture recognition of dynamic skeleton using graph convolutional networks

Dynamic Spatio-Temporal Feature Learning via Graph Convolution in 3D Convolutional Networks

A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography

Spatial‐temporal Slowfast Graph Convolutional Network for Skeleton‐based Action Recognition

An Efficient Graph Convolution Network for Skeleton-Based Dynamic Hand Gesture Recognition

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition

Prompt-supervised dynamic attention graph convolutional network for skeleton-based action recognition

Densely Connected and Multiple Temporal Graph Convolution Networks for Skeleton-based Action Recognition

Dynamic spatial-temporal topology graph network for skeleton-based action recognition

Skeleton-Based Gesture Recognition With Learnable Paths and Signature Features

Dynamic Hypergraph Convolutional Networks for Skeleton-Based Action Recognition

TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential

Lightweight Multi-Scale Spatiotemporal Graph Convolutional Network for Skeleton-Based Action Recognition

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

STCN-GR: Spatial-Temporal Convolutional Networks for Surface-Electromyography-Based Gesture Recognition