LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention

Zewen Du,Zhenjiang Hu,Guiyu Zhao,Ying Jin,Hongbin Ma
2024-11-29
Abstract:Feature upsampling is an essential operation in constructing deep convolutional neural networks. However, existing upsamplers either lack specific feature guidance or necessitate the utilization of high-resolution feature maps, resulting in a loss of performance and flexibility. In this paper, we find that the local self-attention naturally has the feature guidance capability, and its computational paradigm aligns closely with the essence of feature upsampling (\ie feature reassembly of neighboring points). Therefore, we introduce local self-attention into the upsampling task and demonstrate that the majority of existing upsamplers can be regarded as special cases of upsamplers based on local self-attention. Considering the potential semantic gap between upsampled points and their neighboring points, we further introduce the deformation mechanism into the upsampler based on local self-attention, thereby proposing LDA-AQU. As a novel dynamic kernel-based upsampler, LDA-AQU utilizes the feature of queries to guide the model in adaptively adjusting the position and aggregation weight of neighboring points, thereby meeting the upsampling requirements across various complex scenarios. In addition, LDA-AQU is lightweight and can be easily integrated into various model architectures. We evaluate the effectiveness of LDA-AQU across four dense prediction tasks: object detection, instance segmentation, panoptic segmentation, and semantic segmentation. LDA-AQU consistently outperforms previous state-of-the-art upsamplers, achieving performance enhancements of 1.7 AP, 1.5 AP, 2.0 PQ, and 2.5 mIoU compared to the baseline models in the aforementioned four tasks, respectively. Code is available at \url{<a class="link-external link-https" href="https://github.com/duzw9311/LDA-AQU" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the shortcomings of existing up - sampling methods when applied in deep convolutional neural networks. Specifically, the existing up - sampling methods either lack specific feature guidance or need to rely on high - resolution feature maps, which lead to problems of performance degradation and insufficient flexibility. To solve these problems, the authors propose **LDA - AQU** (Local Deformable Attention - based Adaptive Query - guided Upsampling), an adaptive query - guided up - sampling method based on the local deformable attention mechanism. The main contributions of LDA - AQU include: 1. **Introducing the local self - attention mechanism**: Use the local self - attention mechanism to guide the up - sampling task, so that the up - sampling process can adaptively adjust the positions of neighboring points and the aggregation weights according to the context information. 2. **Adding the deformation mechanism**: By introducing the deformation mechanism, dynamically adjust the positions of neighboring points, so as to better adapt to the up - sampling requirements in different complex scenarios. 3. **Lightweight design**: LDA - AQU is a lightweight up - sampler that can be easily integrated into various model architectures without significantly increasing the computational cost. ### Advantages of LDA - AQU - **Single - layer operation**: LDA - AQU only needs to operate in a single layer and does not require the input of high - resolution feature maps. - **Query - guiding ability**: It can use the features of query points to generate dynamic up - sampling kernels to achieve interactive generation. - **Local deformation ability**: It can dynamically adjust the positions of neighboring points according to the context information of query points. ### Experimental verification The authors verified the effectiveness of LDA - AQU through four dense prediction tasks (object detection, instance segmentation, panoptic segmentation, and semantic segmentation). The experimental results show that LDA - AQU outperforms the previous state - of - the - art up - sampling methods in these tasks, achieving performance improvements of 1.7 AP, 1.5 AP, 2.0 PQ, and 2.5 mIoU respectively. ### Summary By introducing the local self - attention mechanism and the deformation mechanism, LDA - AQU solves the problems of lack of feature guidance and insufficient flexibility in existing up - sampling methods, thereby achieving significant performance improvements in multiple dense prediction tasks.