Abstract:Object detection and semantic segmentation are two fundamental techniques for Intelligent Vehicles (IV) and Advanced Driving Assistance System (ADAS). Motivated by recent studies demonstrating that object detection and semantic segmentation are two highly-correlated tasks, this paper handles the problem of joint object detection and semantic segmentation in traffic scenes. Existing methods perform the joint object detection and semantic segmentation by sharing the same backbone network, but always ignore the interactive connection between the subdividing detection branch and segmentation branch, leading to the insufficient interaction between the two branches. Considering this situation, this paper proposes a joint object detection and semantic segmentation model with the cross-attention and inner-attention mechanisms. The cross-attention mechanism enables to build up the essential interaction between the subdividing detection branch and segmentation branch to fully make use of their correlation. In addition, the inner-attention contributes to strengthening the representations of feature maps in the model. Given an image, an encoder-decoder network is firstly used to extract initial feature maps. Then, the inner-attention mechanism is applied to strengthen the initial feature maps to obtain segmentation feature maps. Subsequently, the cross-attention mechanism utilizes the segmentation feature maps to guide the generation of object detection feature maps. Finally, the semantic segmentation is performed on the segmentation feature maps and object detection is performed on the detection feature maps. In the experiments, two well-known public traffic datasets are used to evaluate our model. Our model achieves the highest performance in comparison with several recently-proposed methods. In addition, some ablation studies are conducted to evaluate the proposed inner-attention and cross-attention mechanisms, and experiment results validate their effectiveness.

SimSANet: a simple sequential attention-aided deep neural network for vehicle make and model recognition

Vehicle Behavior Recognition using Multi-Stream 3D Convolutional Neural Network

A Joint Object Detection and Semantic Segmentation Model with Cross-Attention and Inner-Attention Mechanisms

CAM: A fine-grained vehicle model recognition method based on visual attention model

Moving vehicle tracking and scene understanding: A hybrid approach

Framework for Vehicle Make and Model Recognition-A New Large-Scale Dataset and an Efficient Two-Branch-Two-Stage Deep Learning Architecture

MT-IVSN: a novel model for vehicle re-identification

A Multi-Semantic Driver Behavior Recognition Model of Autonomous Vehicles Using Confidence Fusion Mechanism

Embedding Pose Information for Multiview Vehicle Model Recognition

Multi-View Spatial Attention Embedding for Vehicle Re-Identification

Attention-Mechanism-based Tracking Method for Intelligent Internet of Vehicles

Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation

Vehicular Network Intrusion Detection Using a Cascaded Deep Learning Approach with Multi-Variant Metaheuristic

Unsupervised Feature Learning Toward a Real-time Vehicle Make and Model Recognition

Multi-axis interactive multidimensional attention network for vehicle re-identification

SAM: A Rethinking of Prominent Convolutional Neural Network Architectures for Visual Object Recognition.

Automated Vehicle Recognition with Deep Convolutional Neural Networks

Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture

CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Vehicle 24-Color Long Tail Recognition Based on Smooth Modulation Neural Network with Multi-layer Feature Representation

TANet: Text region attention learning for vehicle re-identification