Abstract:Multi-Task Learning (MTL) involves the concurrent training of multiple tasks, offering notable advantages for dense prediction tasks in computer vision. MTL not only reduces training and inference time as opposed to having multiple single-task models, but also enhances task accuracy through the interaction of multiple tasks. However, existing methods face limitations. They often rely on suboptimal cross-task interactions, resulting in task-specific predictions with poor geometric and predictive coherence. In addition, many approaches use inadequate loss weighting strategies, which do not address the inherent variability in task evolution during training. To overcome these challenges, we propose an advanced MTL model specifically designed for dense vision tasks. Our model leverages state-of-the-art vision transformers with task-specific decoders. To enhance cross-task coherence, we introduce a trace-back method that improves both cross-task geometric and predictive features. Furthermore, we present a novel dynamic task balancing approach that projects task losses onto a common scale and prioritizes more challenging tasks during training. Extensive experiments demonstrate the superiority of our method, establishing new state-of-the-art performance across two benchmark datasets. The code is available at:<a class="link-external link-https" href="https://github.com/Klodivio355/MT-CP" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main problems in multi - task learning (MTL) for dense visual prediction tasks: 1. **Cross - Task Coherence**: - **Geometric Consistency**: Existing MTL methods often fail to maintain good geometric consistency between tasks when dealing with dense visual tasks. For example, the feature representations of different tasks may be inconsistent in spatial structure, resulting in inconsistent geometric properties (such as edges, boundaries, etc.) in the prediction results. - **Predictive Consistency**: There is also a lack of consistency in the prediction results between tasks, especially in pixel - level predictions, where the prediction results of different tasks may be contradictory or uncoordinated. 2. **Task Prioritization and Loss Weighting**: - Existing MTL methods usually use a fixed loss - weighting strategy, which cannot effectively cope with the different evolution speeds between tasks during the training process. Some tasks may dominate the training process because of the large scale of their loss functions, resulting in poor learning effects for other tasks. - Dynamically adjusting task weights can better balance the learning progress of different tasks and ensure that all tasks are fully optimized. ### Solutions To solve the above problems, the authors propose a new method named MT - CP (Multi - Task Coherence and Prioritization). Specific measures include: 1. **Introducing the Trace - Back mechanism**: - Trace back cross - task representations through task - specific decoders to enhance geometric and predictive consistency between tasks. This mechanism helps to further refine and optimize the initial prediction results on the basis of the shared backbone network. 2. **Dynamic task priority adjustment**: - A new dynamic loss prioritization scheme (LPS) is proposed, which projects the losses of different tasks onto the same scale and dynamically adjusts the weights according to the difficulty of the tasks. This can ensure that more challenging tasks receive more attention throughout the training process, thereby improving overall performance. ### Experimental Results The experimental results show that the MT - CP model has achieved significant performance improvements on two benchmark datasets, NYUD - v2 and PASCAL - Context, especially in tasks such as semantic segmentation, depth estimation, and surface normal estimation. These results verify the effectiveness of this method in solving cross - task consistency and task priority adjustment problems. ### Formula Summary - **Dynamic Loss Prioritization Scheme (LPS)**: \[ L_{\text{Log - MTL}}=\sum_{i = 1}^{T}\log(1 + w_i)L_i \] where \( L_i \) is the loss of the \( i \)-th task, \( w_i \) is the task weight, and the contribution of different tasks is balanced by dynamically adjusting \( w_i \). - **Task Weight Update Formula**: \[ \tilde{w}_i^n=\frac{\prod_{k = 1}^{H}\frac{L_{i}^{n - k + 1}}{L^{n - k + 1}}}{\prod_{k = 1}^{H}\frac{L^{n - k + 1}}{L^{n - k}}} \] where \( H \) is the task history length, and \( L_i^n \) is the loss of the \( i \)-th task in the \( n \)-th training round. Through these improvements, the MT - CP model not only improves the efficiency and accuracy of multi - task learning but also enhances the synergy between tasks, making the performance of dense visual prediction tasks better.

Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization

Improving Multiple Dense Prediction Performances by Exploiting Inter-Task Synergies for Neuromorphic Vision Sensors

DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning

Cross-Task Affinity Learning for Multitask Dense Scene Predictions

DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction

Robust Estimator Based Adaptive Multi-Task Learning

AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning

HirMTL: Hierarchical Multi-Task Learning for dense scene understanding

When Multitask Learning Meets Partial Supervision: A Computer Vision Review

Exceeding the Limits of Visual-Linguistic Multi-Task Learning

Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks.

Efficient Controllable Multi-Task Architectures

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

Equitable Multi-task Learning

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Multi-task learning based on geometric invariance discriminative features

SGW-based Multi-Task Learning in Vision Tasks

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Towards Impartial Multi-task Learning.

DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning