Abstract:The past decade has demonstrated the potential of human activity recognition (HAR) with WiFi signals owing to non-invasiveness and ubiquity. Previous research has largely concentrated on enhancing precision through sophisticated models. However, the complexity of recognition tasks has been largely neglected. Thus, the performance of the HAR system is markedly diminished when tasked with increasing complexities, such as a larger classification number, the confusion of similar actions, and signal distortion To address this issue, we eliminated conventional convolutional and recurrent backbones and proposed WiTransformer, a novel tactic based on pure Transformers. Nevertheless, Transformer-like models are typically suited to large-scale datasets as pretraining models, according to the experience of the Vision Transformer. Therefore, we adopted the Body-coordinate Velocity Profile, a cross-domain WiFi signal feature derived from the channel state information, to reduce the threshold of the Transformers. Based on this, we propose two modified transformer architectures, united spatiotemporal Transformer (UST) and separated spatiotemporal Transformer (SST) to realize WiFi-based human gesture recognition models with task robustness. SST intuitively extracts spatial and temporal data features using two encoders, respectively. By contrast, UST can extract the same three-dimensional features with only a one-dimensional encoder, owing to its well-designed structure. We evaluated SST and UST on four designed task datasets (TDSs) with varying task complexities. The experimental results demonstrate that UST has achieved recognition accuracy of 86.16% on the most complex task dataset TDSs-22, outperforming the other popular backbones. Simultaneously, the accuracy decreases by at most 3.18% when the task complexity increases from TDSs-6 to TDSs-22, which is 0.14–0.2 times that of others. However, as predicted and analyzed, SST fails because of excessive lack of inductive bias and the limited scale of the training data.

Improving human action recognition by jointly exploiting video and WiFi clues

Human Action Recognition Based on Three-Stream Network with Frame Sequence Features

Human Activity Recognition Based on Self-Attention Mechanism in WiFi Environment

GraSens: A Gabor Residual Anti-aliasing Sensing Framework for Action Recognition using WiFi

WiFi-based Spatiotemporal Human Action Perception

WiFi-TCN: Temporal Convolution for Human Interaction Recognition based on WiFi signal

Human Action Recognition From Digital Videos Based on Deep Learning.

Human Action Recognition Using Deep Learning Methods.

MaskFi: Unsupervised Learning of WiFi and Vision Representations for Multimodal Human Activity Recognition

Human Activity Recognition via Wi-Fi and Inertial Sensors With Machine Learning

Multimodal Fusion-AdaBoost Based Activity Recognition for Smart Home on WiFi Platform

A Wireless-Vision Dataset for Privacy Preserving Human Activity Recognition

Feature decoupling and regeneration towards wifi-based human activity recognition

Wi-Motion: A Robust Human Activity Recognition Using WiFi Signals

WiTransformer: A Novel Robust Gesture Recognition Sensing Model with WiFi

Temporal Unet: Sample Level Human Action Recognition using WiFi

Human Activity Recognition based on WiFi Signal Using Deep Neural Network

Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

Simultaneous Implementation Features Extraction and Recognition Using C3D Network for WiFi-based Human Activity Recognition

Multi-Sensor Data Fusion and CNN-LSTM Model for Human Activity Recognition System

Dual-Stream Contrastive Learning for Channel State Information Based Human Activity Recognition