TFUT: Task fusion upward transformer model for multi-task learning on dense prediction
Zewei Xin,Shalayiding Sirejiding,Yuxiang Lu,Yue Ding,Chunlin Wang,Tamam Alsarhan,Hongtao Lu
DOI: https://doi.org/10.1016/j.cviu.2024.104014
IF: 4.886
2024-04-17
Computer Vision and Image Understanding
Abstract:Transformer-based advancements have shown great promise in solving multi-task learning on dense prediction tasks. Well-designed task interaction modules of these methods further improve the performances by effectively transferring contextual information between tasks. However, many of these methods do not leverage the target task to guide contextual information from the source task. We propose the Task Fusion Upward Transformer (TFUT) model for multi-task learning on dense prediction. To facilitate task interaction, we introduce the Asymmetric Cross Task Interaction module, which utilizes asymmetric transmission in attention. During similarity calculations, the model leverages the target task to guide the expression of contextual information from the source task, ensuring effective transmission of the context information. In order to avoid the loss of detail and the discontinuity of gradient in upsampling, the Upward Transformer Decoder is designed to extract and align multi-scale features using multi-level convolution. The effectiveness of the proposed model has been demonstrated through experiments on the NYUD-v2 dataset and the PASCAL Context dataset. The experimental results show that this model has achieved optimal performance in various single task and multi-task scenarios.
computer science, artificial intelligence,engineering, electrical & electronic