Abstract:The continuously increasing number of traffic accidents necessitates addressing distracted driving, which is responsible for numerous fatalities. Enhancing driver behavior recognition, particularly through developing a highly reliable Advanced Driver Assistance System (ADAS), holds substantial potential for cultivating safer transportation systems. Inspired by the success of Convolutional Neural Networks (CNNs), various networks have been proposed to enhance the detection accuracy of distracting behaviors. However, existing models have too many parameters and rely heavily on extensive labeled data, leading to time-consuming labeling and large models. To address these limitations, we propose a distracted behavior detection method based on a lightweight vision transformer trained using pseudo-label-based semi-supervised learning to learn discriminative representations from labeled and unlabeled data while ensuring a small model and speedy inference. Specifically, we create strong and weak augmented versions of the input minibatch and employ a hybrid lightweight transformer model in a teacher-student network. Pseudo-labels are generated from the teacher's predictions on weakly augmented data. The student model aligns with these labels on strongly augmented input, with teacher parameters evolving through exponential moving average. Our method presents a real-time, accurate solution for distracted driver detection, with the potential to significantly enhance road safety by reducing accidents. The effectiveness of the proposed approach is demonstrated through a comparative evaluation against alternative fully-supervised and semi-supervised methods. Furthermore, our method is evaluated in naturalistic driving settings with varying lighting and complex backgrounds. Experiments conducted on two publicly available driver distraction detection datasets show that our method outperforms current state-of-the-art approaches.

FPT: Fine-Grained Detection of Driver Distraction Based on the Feature Pyramid Vision Transformer.

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Multimodal driver distraction detection using dual-channel network of CNN and Transformer

Distracted Driving Detection by Combining ViT and CNN

Traffic Sign Detection using Feature Fusion and Contextual Information

Driver Distraction Detection Using Semi-Supervised Lightweight Vision Transformer

Fine-Grained Detection of Driver Distraction Based on Neural Architecture Search

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

A lightweight model combining convolutional neural network and Transformer for driver distraction recognition

A Multi-Scale Feature Pyramid Fusion Network for Predicting the Driver's Visual Attention

Driver Distraction Recognition Based on Transfer Learning and Feature Fusion

Driver Distraction Behavior Detection Using a Vision Transformer Model Based on Transfer Learning Strategy

Event-based Driver Distraction Detection and Action Recognition

EBiDA-FPN: Enhanced Bi-Directional Attention Feature Pyramid Network for Object Detection

L-TLA: A Lightweight Driver Distraction Detection Method Based on Three-Level Attention Mechanisms

DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification

Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer

TransConvNet: Perform perceptually relevant driver's visual attention predictions

TransConvNet: Perform Perceptually Relevant Driver’s Visual Attention Predictions

Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition

STFormer3D: Spatio-Temporal Transformer Based 3D Object Detection for Intelligent Driving.