Abstract:Learning discriminative task-specific features simultaneously for multiple distinct tasks is a fundamental problem in multi-task learning. Recent state-of-the-art models consider directly decoding task-specific features from one shared task-generic feature (e.g., feature from a backbone layer), and utilize carefully designed decoders to produce multi-task features. However, as the input feature is fully shared and each task decoder also shares decoding parameters for different input samples, it leads to a static feature decoding process, producing less discriminative task-specific representations. To tackle this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts model that enables learning multiple representative task-generic feature spaces and decoding task-specific features in a dynamic manner. Specifically, TaskExpert introduces a set of expert networks to decompose the backbone feature into several representative task-generic features. Then, the task-specific features are decoded by using dynamic task-specific gating networks operating on the decomposed task-generic features. Furthermore, to establish long-range modeling of the task-specific representations from different layers of TaskExpert, we design a multi-task feature memory that updates at each layer and acts as an additional feature expert for dynamic task-specific feature decoding. Extensive experiments demonstrate that our TaskExpert clearly outperforms previous best-performing methods on all 9 metrics of two competitive multi-task learning benchmarks for visual scene understanding (i.e., PASCAL-Context and NYUD-v2). Codes and models will be made publicly available at <a class="link-external link-https" href="https://github.com/prismformore/Multi-Task-Transformer" rel="external noopener nofollow">this https URL</a>

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

Modeling Task Relationships in Multivariate Soft Sensor with Balanced Mixture-of-Experts

Multi-Task Learning with Calibrated Mixture of Insightful Experts

Spatial-Temporal Graph Multi-Gate Mixture-of-Expert Model for Traffic Prediction

Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

AC-MMOE: A Multi-gate Mixture-of-experts Model Based on Attention and Convolution

HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Research on Joint Recommendation Algorithm for Knowledge Concepts and Learning Partners Based on Improved Multi-Gate Mixture-of-Experts

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts

M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts

Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Multi-task Model and Feature Joint Learning

On Better Exploring and Exploiting Task Relationships in Multitask Learning: Joint Model and Feature Learning.

Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism

A Model-Agnostic Approach to Mitigate Gradient Interference for Multi-Task Learning