Abstract:We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost by jointly adjusting the encoder capacity. This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures based on user's constraints. Our training strategy involves a novel 'Configuration-Invariant Knowledge Distillation' loss that enforces backbone representations to be invariant under different runtime width configurations to enhance accuracy. Further, we present a simple but effective search algorithm that translates user constraints to runtime width configurations of both the shared encoder and task decoders, for sampling the sub-architectures. The key rule for the search algorithm is to provide a larger computational budget to the higher preferred task decoder, while searching a shared encoder configuration that enhances the overall MTL performance. Various experiments on three multi-task benchmarks (PASCALContext, NYUDv2, and CIFAR100-MTL) with diverse backbone architectures demonstrate the advantage of our approach. For example, our method shows a higher controllability by ~33.5% in the NYUD-v2 dataset over prior methods, while incurring much less compute cost.

Exploring Relational Context for Multi-Task Dense Prediction

Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models

Cross-Task Affinity Learning for Multitask Dense Scene Predictions

Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization

Multi-task neural networks by learned contextual inputs

DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning

Learning from Semantically Dependent Multi-Tasks

Rethinking of Feature Interaction for Multi-task Learning on Dense Prediction

Task-Conditional Adapter for Multi-Task Dense Prediction

UNITE: Multitask Learning with Sufficient Feature for Dense Prediction

InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Spatially-Aware Context Neural Networks.

Efficient Controllable Multi-Task Architectures

Multi-task Learning with 3D-Aware Regularization

Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs

Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions

TFUT: Task fusion upward transformer model for multi-task learning on dense prediction

Adaptive and Dynamic Knowledge Transfer in Multi-task Learning with Attention Networks.

Unleashing the Power of Context: Contextual Association Network with Cross-Task Attention for Joint Relational Extraction.

Improving Multiple Dense Prediction Performances by Exploiting Inter-Task Synergies for Neuromorphic Vision Sensors

Efficient Spatialtemporal Context Modeling for Action Recognition