Abstract:Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time. However, these algorithms are designed for different tasks mostly not within the scope of autonomous driving, thus making it hard to compare multi-task methods in autonomous driving. Aiming to enable the comprehensive evaluation of present multi-task learning methods in autonomous driving, we extensively investigate the performance of popular multi-task methods on the large-scale driving dataset, which covers four common perception tasks, i.e., object detection, semantic segmentation, drivable area segmentation, and lane detection. We provide an in-depth analysis of current multi-task learning methods under different common settings and find out that the existing methods make progress but there is still a large performance gap compared with single-task baselines. To alleviate this dilemma in autonomous driving, we present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting to guide the model toward learning high-quality task-specific representations. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories and further mitigate the performance gap. Furthermore, we bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving. Comprehensive experimental results on the diverse self-driving dataset BDD100K show that the VE-Prompt improves the multi-task baseline and further surpasses single-task models.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the performance evaluation and improvement of Multi-task Learning (MTL) in autonomous driving. Specifically: 1. **Performance Evaluation**: - Most existing multi-task learning algorithms are not specifically designed for autonomous driving, making it difficult to compare multi-task methods in this field. - To comprehensively evaluate the performance of existing multi-task learning methods in autonomous driving, the authors extensively studied various popular multi-task learning methods on a large-scale driving dataset, covering four common perception tasks: object detection, semantic segmentation, drivable area segmentation, and lane detection. 2. **Performance Gap**: - The study found that although existing multi-task learning methods have made some progress, there is still a significant performance gap compared to single-task baselines. 3. **Improvement Methods**: - To address this issue, the authors proposed an effective multi-task framework—VE-Prompt (Visual Exemplar Driven Task-Prompting), which guides the model to learn high-quality task-specific representations by introducing task-specific visual exemplars. - Specifically, VE-Prompt generates visual exemplars based on bounding boxes and color markers, providing accurate visual appearances of target categories, further narrowing the performance gap. - Additionally, the authors achieved efficient and accurate unified perception by bridging Transformer-based encoders and convolutional layers. ### Main Contributions 1. **In-depth Analysis**: - Provided an in-depth analysis of current multi-task learning methods under multiple settings, including three common multi-task data partition settings, two partial label learning methods, three task scheduling techniques, and three task balancing strategies. 2. **Effective Framework**: - Proposed the VE-Prompt framework, which utilizes visual exemplars to provide task-specific visual cues, guiding the model to learn high-quality task-specific representations. 3. **Performance Improvement**: - The VE-Prompt framework significantly outperformed competitive multi-task methods on all tasks and even surpassed single-task models on certain tasks. ### Related Work 1. **Multi-task Learning**: - Multi-task learning jointly trains on multiple tasks by sharing parameters, leveraging potential information between tasks to improve efficiency and accuracy. - Some well-known multi-task learning models include Mask R-CNN, YOLOP, etc. 2. **Visual Perception in Autonomous Driving**: - Autonomous driving relies on perception systems to gather information and understand the environment. Visual perception provides high-resolution images, meeting almost all tasks required for autonomous driving. - Traditional independent models running perception tasks waste time and computational resources, necessitating the development of unified perception systems. 3. **Prompt-based Learning**: - Prompt-based learning aims to bridge the gap between pre-training and model fine-tuning. Models like GPT-3 handle downstream tasks by designing different text prompts. - Recent work has achieved significant performance in zero-shot image classification by injecting visual categories as prompts. ### Experimental Setup 1. **Dataset**: - Experiments were conducted on the BDD100K dataset, which contains approximately 74k training images, covering object detection, semantic segmentation, drivable area segmentation, and lane detection tasks. 2. **Evaluation Metrics**: - In addition to reporting the performance of each task, a comprehensive multi-task performance metric \(\Delta_{\text{MTL}}\) was adopted to evaluate overall multi-task performance. 3. **Experimental Results**: - Experimental results showed that VE-Prompt significantly outperformed other multi-task methods in all settings, particularly excelling in object detection and semantic segmentation tasks. ### Conclusion This paper provides an in-depth analysis of existing multi-task learning methods and proposes a new framework, VE-Prompt, effectively addressing the performance gap in multi-task learning for autonomous driving, providing strong support for unified perception systems in autonomous driving.

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving

A Multi-Task Network Based on Dual-Neck Structure for Autonomous Driving Perception

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Language Prompt for Autonomous Driving

Enhanced encoder–decoder architecture for visual perception multitasking of autonomous driving

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

Visual Prompt Multi-Modal Tracking

Multi-Task Visual Perception for Object Detection and Semantic Segmentation in Intelligent Driving

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Research on Road Scene Understanding of Autonomous Vehicles Based on Multi-Task Learning

NeurAll: Towards a Unified Visual Perception Model for Automated Driving

Multi-Task Deep Learning Model for Autonomous Driving: Object Detection, Semantic Segmentation, and Depth Estimation

A Multi-Task Road Feature Extraction Network with Grouped Convolution and Attention Mechanisms

MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

UnstrPrompt: Large Language Model Prompt for Driving in Unstructured Scenarios

Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator

A panoramic driving perception fusion algorithm based on multi-task learning

Multi-Prompt with Depth Partitioned Cross-Modal Learning

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs