ESDAR-net: Towards High-Accuracy and Real-Time Driver Action Recognition for Embedded Systems

Yaocong Hu,Zhen Shuai,Huicheng Yang,Guoyang Wan,Yajun Zhang,Chao Xie,Mingqi Lu,Xiaobo Lu
DOI: https://doi.org/10.1007/s11042-023-15777-0
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.
What problem does this paper attempt to address?