Abstract:Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There is also some evidence for other benefits in improving accuracy for some tasks and easing development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. We first demonstrate results of multi-stream learning and auxiliary learning which are important ingredients to scale to a large multi-task model. Finally, we implement a two-stream three-task network which performs better in many cases compared to their corresponding single-task models, while maintaining network size.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the issue of unified modeling for visual perception tasks in autonomous driving. Specifically, it proposes a multi-task joint network design named NeurAll to achieve simultaneous learning of several important visual perception tasks. The main motivations are as follows: 1. **Computational Efficiency**: By sharing expensive initial convolutional layers across all tasks, computational efficiency can be significantly improved. In autonomous driving systems, the processing power of deployed hardware is limited, so improving computational efficiency is crucial. 2. **Accuracy Improvement**: Some studies suggest that multi-task learning can improve the accuracy of certain tasks and simplify development work. 3. **Scalability**: The model can easily add more tasks, leveraging existing features to achieve better generalization. ### Main Contributions 1. **Unified Model Design**: The paper proposes a unified CNN model to handle key visual perception tasks in autonomous driving, such as object recognition, motion estimation, depth estimation, and localization. 2. **Multi-Stream Learning**: A multi-stream architecture is introduced to capture temporal cues by processing consecutive frames, further enhancing model performance. 3. **Auxiliary Learning**: The performance of primary tasks (e.g., semantic segmentation) is enhanced by introducing auxiliary tasks (e.g., depth regression). 4. **Experimental Validation**: Experiments demonstrate the performance improvement of the multi-task model on multiple datasets, particularly in video segmentation and semantic segmentation tasks. ### Experimental Results 1. **Multi-Stream Learning**: The multi-stream model performs excellently in video segmentation tasks, with performance improvements of 11% and 4% (on KITTI and SYNTHIA validation sets) compared to the single-stream model, with only a slight increase in computational complexity. 2. **Auxiliary Learning**: By introducing depth regression as an auxiliary task, the performance of the semantic segmentation task is significantly improved, with IoU metrics increasing by 4% and 3% (on KITTI and SYNTHIA validation sets). 3. **Comparison of Multi-Task and Single-Task Models**: The two-stream three-task unified model outperforms the corresponding single-task models across multiple tasks while maintaining the network scale. ### Conclusion By proposing the NeurAll model, the paper demonstrates the potential of multi-task learning in visual perception tasks for autonomous driving. The model not only improves computational efficiency but also enhances task accuracy and has good scalability. Future research directions include building larger-scale multi-task models and providing more task data set support.

NeurAll: Towards a Unified Visual Perception Model for Automated Driving

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving

A Multi-Task Network Based on Dual-Neck Structure for Autonomous Driving Perception

Enhanced encoder–decoder architecture for visual perception multitasking of autonomous driving

Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving

Fast Recurrent Fully Convolutional Networks for Direct Perception in Autonomous Driving

End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception

A Brain-Inspired Multibranch Parallel Interactive Vision Mechanism for Advanced Driver Assistance Systems

Learning On-Road Visual Control for Self-Driving Vehicles with Auxiliary Tasks

Deep learning and control algorithms of direct perception for autonomous driving

Multi-Task Learning in Autonomous Driving Scenarios Via Adaptive Feature Refinement Networks

Multi-camera Bird's Eye View Perception for Autonomous Driving

Overview of Deep Learning Intelligent Driving Methods

Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks

Beyond One Model Fits All: Ensemble Deep Learning for Autonomous Vehicles

Research on Visual Perception Technology of Autonomous Driving Based on Improved Convolutional Neural Network

Cutransnet: Transformers to Make Strong Encoders for Multi-Task Vision Perception of Autonomous Driving

Joint Semantic Understanding with a Multilevel Branch for Driving Perception

Applications of visual perception techniques using neural networks in autonomous driving