Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer

Ekim Yurtsever,Yongkang Liu,Jacob Lambert,Chiyomi Miyajima,Eijiro Takeuchi,Kazuya Takeda,John H. L. Hansen
DOI: https://doi.org/10.1109/itsc.2019.8917362
2019-10-01
Abstract:Advanced driver assistance and automated driving systems rely on risk estimation modules to predict and avoid dangerous situations. Current methods use expensive sensor setups and complex processing pipelines, limiting their availability and robustness. To address these issues, we introduce a novel deep learning based driving risk assessment framework for classifying dangerous lane change behavior in short video clips captured by a monocular camera. First, semantic segmentation masks were generated from individual video frames with a pre-trained Mask R-CNN model. Then, frames overlayed with these masks were fed into a time distributed CNN-LSTM network with a final softmax classification layer. This network was trained on a semi-naturalistic lane change dataset with annotated risk labels. A comprehensive comparison of state-of-the-art pre-trained feature extractors was carried out to find the best network layout and training strategy. The best result, with a 0.937 AUC score, was obtained with the proposed framework. Our code and trained models are available open-source1.
What problem does this paper attempt to address?