Abstract:Object detection and semantic segmentation are two fundamental techniques for Intelligent Vehicles (IV) and Advanced Driving Assistance System (ADAS). Motivated by recent studies demonstrating that object detection and semantic segmentation are two highly-correlated tasks, this paper handles the problem of joint object detection and semantic segmentation in traffic scenes. Existing methods perform the joint object detection and semantic segmentation by sharing the same backbone network, but always ignore the interactive connection between the subdividing detection branch and segmentation branch, leading to the insufficient interaction between the two branches. Considering this situation, this paper proposes a joint object detection and semantic segmentation model with the cross-attention and inner-attention mechanisms. The cross-attention mechanism enables to build up the essential interaction between the subdividing detection branch and segmentation branch to fully make use of their correlation. In addition, the inner-attention contributes to strengthening the representations of feature maps in the model. Given an image, an encoder-decoder network is firstly used to extract initial feature maps. Then, the inner-attention mechanism is applied to strengthen the initial feature maps to obtain segmentation feature maps. Subsequently, the cross-attention mechanism utilizes the segmentation feature maps to guide the generation of object detection feature maps. Finally, the semantic segmentation is performed on the segmentation feature maps and object detection is performed on the detection feature maps. In the experiments, two well-known public traffic datasets are used to evaluate our model. Our model achieves the highest performance in comparison with several recently-proposed methods. In addition, some ablation studies are conducted to evaluate the proposed inner-attention and cross-attention mechanisms, and experiment results validate their effectiveness.

Intersection Perception Through Real-Time Semantic Segmentation to Assist Navigation of Visually Impaired Pedestrians

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Semantic perception of curbs beyond traversability for real-world navigation assistance systems

Intersection Navigation For People With Visual Impairment

An Environmental Perception and Navigational Assistance System for Visually Impaired Persons Based on Semantic Stixels and Sound Interaction

Robustifying Semantic Cognition of Traversability Across Wearable RGB-depth Cameras

Crosswalk navigation for people with visual impairments on a wearable device.

Unifying Terrain Awareness for the Visually Impaired through Real-Time Semantic Segmentation

Unifying Visual Localization and Scene Recognition for People with Visual Impairment

Visual Localizer: Outdoor Localization Based on ConvNet Descriptor and Global Optimization for Visually Impaired Pedestrians

Long-Range Traversability Awareness and Low-Lying Obstacle Negotiation with RealSense for the Visually Impaired

Real-time Pedestrian Crossing Lights Detection Algorithm for the Visually Impaired

Expanding the Detection of Traversable Area with RealSense for the Visually Impaired

A New Approach of Point Cloud Processing and Scene Segmentation for Guiding the Visually Impaired

Visual Localization of Key Positions for Visually Impaired People

KrNet: A Kinetic Real-Time Convolutional Neural Network for Navigational Assistance

Rapid Detection of Blind Roads and Crosswalks by Using a Lightweight Semantic Segmentation Network

A Joint Object Detection and Semantic Segmentation Model with Cross-Attention and Inner-Attention Mechanisms

Can We Unify Perception and Localization in Assisted Navigation? An Indoor Semantic Visual Positioning System for Visually Impaired People

Illuminating Pedestrians via Simultaneous Detection & Segmentation

Panoptic Lintention Network: Towards Efficient Navigational Perception for the Visually Impaired