SGW-based Multi-Task Learning in Vision Tasks

Ruiyuan Zhang,Yuyao Chen,Yuchi Huo,Jiaxiang Liu,Dianbing Xi,Jie Liu,Chao Wu
2024-10-03
Abstract:Multi-task-learning(MTL) is a multi-target optimization task. Neural networks try to realize each target using a shared interpretative space within MTL. However, as the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging. In this paper, we first re-examine previous cross-attention MTL methods from the perspective of noise. We theoretically analyze this issue and identify it as a flaw in the cross-attention mechanism. To address this issue, we propose an information bottleneck knowledge extraction module (KEM). This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity. Furthermore, we have employed neural collapse to stabilize the knowledge-selection process. That is, before input to KEM, we projected the features into ETF space. This mapping makes our method more robust. We implemented and conducted comparative experiments with this method on multiple datasets. The results demonstrate that our approach significantly outperforms existing methods in multi-task learning.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of inter - task interference in multi - task learning (MTL). As the scale of data sets expands and the complexity of tasks increases, knowledge sharing becomes more and more challenging, especially in visual tasks. Specifically, the paper mainly addresses the following issues: 1. **Defects of cross - attention mechanisms**: Existing cross - attention mechanisms have difficulty effectively sharing knowledge when dealing with large - scale data sets or complex tasks. In particular, the Softmax operation rarely sets weights to zero, resulting in unnecessary noise, which interferes with task performance. 2. **Noise interference between tasks**: Irrelevant information between different tasks will be converted into noise after the Softmax operation, affecting the performance of other tasks. For example, in multi - task learning, irrelevant information of one task may become noise and affect the learning process of another task. 3. **High computational complexity**: The computational complexity of traditional cross - attention mechanisms grows quadratically. As the number and complexity of tasks increase, the computational requirements rise sharply. To address these problems, the authors propose an information bottleneck knowledge extraction module (KEM) and further enhance its stability by introducing neural collapse. The following are the specific solutions: - **Information bottleneck knowledge extraction module (KEM)**: By restricting the information flow to reduce inter - task interference, thereby reducing computational complexity. KEM includes three steps: retrieve, write, and broadcast. These steps work together to select useful information, eliminate invalid noise information, and resist inter - task interference. - **Stable knowledge extraction module (sKEM)**: By projecting input features into the equiangular tight frame (ETF) space, the memory can select features with fewer statistical characteristics. This makes the model more robust and improves its anti - interference ability against unbalanced inputs. In summary, this paper aims to solve the problem of inter - task interference in multi - task learning by proposing KEM and sKEM, and verifies the effectiveness and superiority of these two methods through experiments.