Do You Act Like You Talk? Exploring Pose-based Driver Action Classification with Speech Recognition Networks

L. Bergasa,Santiago Montiel-Marín,Angel Llamazares,Miguel Antunes,Pablo Pardo-Decimavilla
DOI: https://doi.org/10.1109/IV55156.2024.10588839
2024-06-02
Abstract:Recognizing distractions on the road is crucial to reduce traffic accidents. Video-based networks are typically used, but are limited by their computational cost and are vulnerable to viewpoint changes. In this paper, we propose a novel approach for pose-based driver action classification using speech recognition networks, which is lighter and more viewpoint invariant that video-based one. We leverage the similarity in the encoding of information between audio and pose data, representing poses as key points over time. Our architecture is based on Squeezeformer, an efficient attentionbased speech recognition network. We introduce a selection of data augmentation techniques to enhance generalization. Experiments on the Drive&Act dataset demonstrate superior performance compared to state-of-the-art methods. Additionally, we explore the integration of object information and the impact of viewpoint changes. Our results highlight the effectiveness and robustness of speech recognition networks in pose-based action classification.
Engineering,Computer Science
What problem does this paper attempt to address?