Abstract:Driver behavior recognition has become one of the most important tasks for intelligent vehicles. This task, however, is very challenging since the background contents in real-world driving scenarios are often very complex. More critically, the difference between driving behaviors is often very minor, making it extremely difficult to distinguish them. Existing methods often rely only on RGB frames (or skeleton data), which may fail to capture the minor differences between behaviors and appearance information of objects simultaneously and thus fail to achieve promising performance. To address the above issues, in this paper, we propose a bidirectional posture-appearance interaction network (BPAI-Net), which simultaneously considers RGB frames and skeleton (i.e., posture) data for driver behavior recognition. Specifically, we propose a posture-guided convolutional neural network (PG-CNN) and an appearance-guided graph convolutional network (AG-GCN) to extract appearance and posture features, respectively. To exploit the complementary information between appearance and posture, we use the appearance features from PG-CNN for guiding AG-GCN to exploit the contextual information (e.g., nearby objects) to enhance posture features. Then, we use the enhanced posture features from AG-GCN to help PG-CNN focus on critical local areas of video frames that are related to driver behaviors. In this sense, we are able to use the interaction between two modalities to extract more discriminative features and thus improve the recognition accuracy. Experimental results on Drive&Act dataset show that our method outperforms state-of-the-art methods by a large margin (67.83% vs. 63.64%). Furthermore, we collect a bus driver behavior recognition dataset and yield consistent performance gain against baseline methods, demonstrating the effectiveness of our method in real-world applications. The source code and trained models are available at github.com/SCUT-AILab/BPAI-Net/.

Do You Act Like You Talk? Exploring Pose-based Driver Action Classification with Speech Recognition Networks

ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition.

Pose-guided multi-task video transformer for driver action recognition

Driver Activity Classification Using Generalizable Representations from Vision-Language Models

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer

Bidirectional Posture-Appearance Interaction Network for Driver Behavior Recognition

Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition

A Hybrid Deep Learning Model for Recognizing Actions of Distracted Drivers

Multimodal driver emotion recognition using motor activity and facial expressions

Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer

No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration

Neural network-based method for visual recognition of driver's voice commands using attention mechanism

A Spatio-Temporal Multilayer Perceptron for Gesture Recognition

Traffic police command gesture recognition technology based on machine vision and two-stream spatio-temporal attention graph convolutional network

Lane Change Classification and Prediction with Action Recognition Networks

Action-Based Representation Learning for Autonomous Driving

Action Recognition in Videos through a Transfer-Learning-Based Technique

It's all about you: Personalized in-Vehicle Gesture Recognition with a Time-of-Flight Camera

Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles