Abstract:We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost by jointly adjusting the encoder capacity. This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures based on user's constraints. Our training strategy involves a novel 'Configuration-Invariant Knowledge Distillation' loss that enforces backbone representations to be invariant under different runtime width configurations to enhance accuracy. Further, we present a simple but effective search algorithm that translates user constraints to runtime width configurations of both the shared encoder and task decoders, for sampling the sub-architectures. The key rule for the search algorithm is to provide a larger computational budget to the higher preferred task decoder, while searching a shared encoder configuration that enhances the overall MTL performance. Various experiments on three multi-task benchmarks (PASCALContext, NYUDv2, and CIFAR100-MTL) with diverse backbone architectures demonstrate the advantage of our approach. For example, our method shows a higher controllability by ~33.5% in the NYUD-v2 dataset over prior methods, while incurring much less compute cost.

Learning Compact Neural Networks with Deep Overparameterised Multitask Learning

Training compact neural networks via

Training Compact Neural Networks via Auxiliary Overparameterization

Distributed Jointly Sparse Multitask Learning over Networks

Low-Rank Deep Convolutional Neural Network for Multi-Task Learning

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

Deep multi-task learning with flexible and compact architecture search

Learning Sparse Sharing Architectures for Multiple Tasks.

Dynamic Multi-Task Learning with Convolutional Neural Network

Efficient Computation Sharing for Multi-Task Visual Scene Understanding

A unified architecture for natural language processing: Deep neural networks with multitask learning

UNITE: Multitask Learning with Sufficient Feature for Dense Prediction

Multitask Learning With Enhanced Modules.

Proximal Multitask Learning Over Distributed Networks With Jointly Sparse Structure

Adaptive Hard Parameter Sharing Method Based on Multi-Task Deep Learning

Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression

Less is More -- Towards parsimonious multi-task models using structured sparsity

EMT-NAS: Transferring Architectural Knowledge Between Tasks from Different Datasets

Efficient Controllable Multi-Task Architectures

Re-training and parameter sharing with the Hash trick for compressing convolutional neural networks

Novel Multitask Conditional Neural-Network Surrogate Models for Expensive Optimization.