Abstract:Learning discriminative task-specific features simultaneously for multiple distinct tasks is a fundamental problem in multi-task learning. Recent state-of-the-art models consider directly decoding task-specific features from one shared task-generic feature (e.g., feature from a backbone layer), and utilize carefully designed decoders to produce multi-task features. However, as the input feature is fully shared and each task decoder also shares decoding parameters for different input samples, it leads to a static feature decoding process, producing less discriminative task-specific representations. To tackle this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts model that enables learning multiple representative task-generic feature spaces and decoding task-specific features in a dynamic manner. Specifically, TaskExpert introduces a set of expert networks to decompose the backbone feature into several representative task-generic features. Then, the task-specific features are decoded by using dynamic task-specific gating networks operating on the decomposed task-generic features. Furthermore, to establish long-range modeling of the task-specific representations from different layers of TaskExpert, we design a multi-task feature memory that updates at each layer and acts as an additional feature expert for dynamic task-specific feature decoding. Extensive experiments demonstrate that our TaskExpert clearly outperforms previous best-performing methods on all 9 metrics of two competitive multi-task learning benchmarks for visual scene understanding (i.e., PASCAL-Context and NYUD-v2). Codes and models will be made publicly available at <a class="link-external link-https" href="https://github.com/prismformore/Multi-Task-Transformer" rel="external noopener nofollow">this https URL</a>

Task-Conditional Adapter for Multi-Task Dense Prediction

Prompt Guided Transformer for Multi-Task Dense Prediction

Task Indicating Transformer for Task-conditional Dense Predictions

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Task-conditioned adaptation of visual features in multi-task policy learning

Atten-Adapter: A Unified Attention-Based Adapter for Efficient Tuning

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

Exploring Relational Context for Multi-Task Dense Prediction

Rethinking of Feature Interaction for Multi-task Learning on Dense Prediction

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

Vision Transformer Adapter for Dense Predictions

Adaptive Task-Wise Message Passing for Multi-Task Learning: A Spatial Interaction Perspective

TFUT: Task fusion upward transformer model for multi-task learning on dense prediction

Adapter Tuning with Task-Aware Attention Mechanism

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

UNITE: Multitask Learning with Sufficient Feature for Dense Prediction

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

M-adapter: Multi-level image-to-video adaptation for video action recognition

ViT-Adapter: Exploring Plain Vision Transformer for Accurate Dense Predictions