Abstract:We have developed a series of effective online data augmentation strategies for the traffic sign recognition dataset, which are able to improve the model performance without any extra computational overhead during the prediction process; To enhance the feature extraction ability of the backbone almost without extra model complexity, we have developed an efficient CSAM module placed at the beginning of the backbone, with the help of the hybrid channel and spatial attention mechanism and the residual bottleneck structure; To make better use of the features extracted by the backbone, we combined the channel attention module CAM with the feature pyramid network (FPN) and path aggregation network (PAN) structure for a multi‐scale attention feature fusion detection head. Currently, traffic sign recognition techniques have been brought into the assistive driving of automobiles. However, small traffic sign recognition in real scenes is still a challenging task due to the class imbalance issue and the size limit of the traffic signs. To address the above issues, a feature‐enhanced hybrid attention network is proposed based on YOLOv5s for a small, fast, and accurate traffic sign detector. First, a series of online data augmentation strategies are designed in the preprocessing module for the model training. Second, the hybrid channel and spatial attention module CSAM are integrated into the backbone for a better feature extraction ability. Third, the channel attention module CAM is used in the detection head for a more efficient feature fusion ability. To validate the approach, extensive experiments are conducted based on the Tsinghua‐Tencent 100K dataset. It is found that the novel method achieves state‐of‐the‐art performance with only negligible increases in the model parameter and computational overhead. Specifically, the mAP@0.5 , parameters, and FLOPs are 85.8%, 7.13 M, and 16.1 G, respectively.

Attention Mechanism Based on Improved Spatial-Temporal Convolutional Neural Networks for Traffic Police Gesture Recognition

A Channel-Wise Spatial-Temporal Aggregation Network for Action Recognition

Traffic Police 3D Gesture Recognition Based on Spatial–Temporal Fully Adaptive Graph Convolutional Network

Chinese Traffic Police Gesture Recognition Based on Graph Convolutional Network in Natural Scene

Visual Recognition of traffic police gestures with convolutional pose machine and handcrafted features

Traffic police command gesture recognition technology based on machine vision and two-stream spatio-temporal attention graph convolutional network

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Spatial-Temporal Hypergraph Neural Network based on Attention Mechanism for Multi-view Data Action Recognition

Gesture recognition of traffic police based on static and dynamic descriptor fusion

Simple But Effective: Upper-Body Geometric Features for Traffic Command Gesture Recognition

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

A feature‐enhanced hybrid attention network for traffic sign recognition in real scenes

Unified Spatio-Temporal Attention Networks for Action Recognition in Videos.

Spatial-temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition

FFCSLT: a deep learning model for traffic police hand gesture recognition using surface electromyographic signals

Spatio-Temporal Attention Networks for Action Recognition and Detection

CANet: Comprehensive Attention Network for video-based action recognition

STCA: an action recognition network with spatio-temporal convolution and attention

Recongnition of Distracted Driving Behavior Based on Improved Bi-LSTM Model and Attention Mechanism

Human Behavior Recognition Based on Attention Mechanism and Bottleneck Residual Dual-Path Spatiotemporal Graph Convolutional Network

Spatio-Temporal Adaptive Network with Bidirectional Temporal Difference for Action Recognition