Abstract:Multi-task visual perception has a wide range of applications in scene understanding such as autonomous driving. In this work, we devise an efficient unified framework to solve multiple common perception tasks, including instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation. Simply sharing the same visual feature representations for these tasks impairs the performance of tasks, while independent task-specific feature extractors lead to parameter redundancy and latency. Thus, we design two feature-merge branches to learn feature basis, which can be useful to, and thus shared by, multiple perception tasks. Then, each task takes the corresponding feature basis as the input of the prediction task head to fulfill a specific task. In particular, one feature merge branch is designed for instance-level recognition the other for dense predictions. To enhance inter-branch communication, the instance branch passes pixel-wise spatial information of each instance to the dense branch using efficient dynamic convolution weighting. Moreover, a simple but effective dynamic routing mechanism is proposed to isolate task-specific features and leverage common properties among tasks. Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception. In addition, as tasks benefit from co-training with each other, our solution achieves on par results on partially labeled settings on nuScenes and outperforms previous works for 3D detection and depth estimation on the Cityscapes dataset with full supervision.

Task-Interaction-Free Multi-Task Learning with Efficient Hierarchical Feature Representation

Network Traffic Classification Via Non-Convex Multi-Task Feature Learning

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

HirMTL: Hierarchical Multi-Task Learning for dense scene understanding

HFedMTL: Hierarchical Federated Multi-Task Learning

Rethinking of Feature Interaction for Multi-task Learning on Dense Prediction

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

A Dynamic Feature Interaction Framework for Multi-task Visual Perception

Exploiting High-Order Information in Heterogeneous Multi-Task Feature Learning.

Task-driven Image Fusion with Learnable Fusion Loss

Exploiting Task-Feature Co-Clusters In Multi-Task Learning

TFUT: Task fusion upward transformer model for multi-task learning on dense prediction

Hierarchical Deep Multi-task Learning with Attention Mechanism for Similarity Learning

AdaTT: Adaptive Task-to-Task Fusion Network for Multitask Learning in Recommendations

Multi-task learning based on geometric invariance discriminative features

Multi-task Model and Feature Joint Learning

Deep Asymmetric Multi-task Feature Learning

Multi-Task Networks With Universe, Group, and Task Feature Learning

Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

Adaptive and Dynamic Knowledge Transfer in Multi-task Learning with Attention Networks.

Deep collaborative multi-task network: A human decision process inspired model for hierarchical image classification