Abstract:In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. We propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN) to address this issue. By merging the energy efficiency of Spiking Neural Network (SNN) with the graph representation capability of GCN, the proposed MK-SGN reduces energy consumption while maintaining recognition accuracy. Firstly, we convert Graph Convolutional Networks (GCN) into Spiking Graph Convolutional Networks (SGN) establishing a new benchmark and paving the way for future research exploration. During this process, we introduce a spiking attention mechanism and design a Spiking-Spatio Graph Convolution module with a Spatial Global Spiking Attention mechanism (SA-SGC), enhancing feature learning capability. Secondly, we propose a Spiking Multimodal Fusion module (SMF), leveraging mutual information to process multimodal data more efficiently. Lastly, we delve into knowledge distillation methods from multimodal GCN to SGN and propose a novel, integrated method that simultaneously focuses on both intermediate layer distillation and soft label distillation to improve the performance of SGN. MK-SGN outperforms the state-of-the-art GCN-like frameworks on three challenging datasets for skeleton-based action recognition in reducing energy consumption. It also outperforms the state-of-the-art SNN frameworks in accuracy. Specifically, our method reduces energy consumption by more than 98% compared to typical GCN-based methods, while maintaining competitive accuracy on the NTU-RGB+D 60 cross-subject split using 4-time steps.

MKE-GCN: Multi-Modal Knowledge Embedded Graph Convolutional Network for Skeleton-Based Action Recognition in the Wild

Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition

Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition

MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition

Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition

MGSAN: multimodal graph self-attention network for skeleton-based action recognition

DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition

SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition

Lighter and faster: A multi-scale adaptive graph convolutional network for skeleton-based action recognition

A Multi-Stream Graph Convolutional Networks-Hidden Conditional Random Field Model for Skeleton-Based Action Recognition

Multi‐temporal scale aggregation refinement graph convolutional network for skeleton‐based action recognition

Body Prior Guided Graph Convolutional Neural Network for Skeleton-Based Action Recognition

Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Enhanced Adjacency Matrix-Based Lightweight Graph Convolution Network for Action Recognition