Abstract:Feature upsampling is an essential operation in constructing deep convolutional neural networks. However, existing upsamplers either lack specific feature guidance or necessitate the utilization of high-resolution feature maps, resulting in a loss of performance and flexibility. In this paper, we find that the local self-attention naturally has the feature guidance capability, and its computational paradigm aligns closely with the essence of feature upsampling (\ie feature reassembly of neighboring points). Therefore, we introduce local self-attention into the upsampling task and demonstrate that the majority of existing upsamplers can be regarded as special cases of upsamplers based on local self-attention. Considering the potential semantic gap between upsampled points and their neighboring points, we further introduce the deformation mechanism into the upsampler based on local self-attention, thereby proposing LDA-AQU. As a novel dynamic kernel-based upsampler, LDA-AQU utilizes the feature of queries to guide the model in adaptively adjusting the position and aggregation weight of neighboring points, thereby meeting the upsampling requirements across various complex scenarios. In addition, LDA-AQU is lightweight and can be easily integrated into various model architectures. We evaluate the effectiveness of LDA-AQU across four dense prediction tasks: object detection, instance segmentation, panoptic segmentation, and semantic segmentation. LDA-AQU consistently outperforms previous state-of-the-art upsamplers, achieving performance enhancements of 1.7 AP, 1.5 AP, 2.0 PQ, and 2.5 mIoU compared to the baseline models in the aforementioned four tasks, respectively. Code is available at \url{<a class="link-external link-https" href="https://github.com/duzw9311/LDA-AQU" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the shortcomings of existing up - sampling methods when applied in deep convolutional neural networks. Specifically, the existing up - sampling methods either lack specific feature guidance or need to rely on high - resolution feature maps, which lead to problems of performance degradation and insufficient flexibility. To solve these problems, the authors propose **LDA - AQU** (Local Deformable Attention - based Adaptive Query - guided Upsampling), an adaptive query - guided up - sampling method based on the local deformable attention mechanism. The main contributions of LDA - AQU include: 1. **Introducing the local self - attention mechanism**: Use the local self - attention mechanism to guide the up - sampling task, so that the up - sampling process can adaptively adjust the positions of neighboring points and the aggregation weights according to the context information. 2. **Adding the deformation mechanism**: By introducing the deformation mechanism, dynamically adjust the positions of neighboring points, so as to better adapt to the up - sampling requirements in different complex scenarios. 3. **Lightweight design**: LDA - AQU is a lightweight up - sampler that can be easily integrated into various model architectures without significantly increasing the computational cost. ### Advantages of LDA - AQU - **Single - layer operation**: LDA - AQU only needs to operate in a single layer and does not require the input of high - resolution feature maps. - **Query - guiding ability**: It can use the features of query points to generate dynamic up - sampling kernels to achieve interactive generation. - **Local deformation ability**: It can dynamically adjust the positions of neighboring points according to the context information of query points. ### Experimental verification The authors verified the effectiveness of LDA - AQU through four dense prediction tasks (object detection, instance segmentation, panoptic segmentation, and semantic segmentation). The experimental results show that LDA - AQU outperforms the previous state - of - the - art up - sampling methods in these tasks, achieving performance improvements of 1.7 AP, 1.5 AP, 2.0 PQ, and 2.5 mIoU respectively. ### Summary By introducing the local self - attention mechanism and the deformation mechanism, LDA - AQU solves the problems of lack of feature guidance and insufficient flexibility in existing up - sampling methods, thereby achieving significant performance improvements in multiple dense prediction tasks.

LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention

Learning to Upsample by Learning to Sample

Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels

Semantically-Adaptive Upsampling for Layout-to-Image Translation

More than Encoder: Introducing Transformer Decoder to Upsample

DSAP: Dynamic Sparse Attention Perception Matcher for Accurate Local Feature Matching

A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling

SAPA: Similarity-Aware Point Affiliation for Feature Upsampling

ASYMMETRIC ATTENTION UPSAMPLING: RETHINKING UPSAMPLING FOR BIOLOGICAL IMAGE SEGMENTATION

SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction with Self-Projection Optimization

Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling

Upsampling Autoencoder for Self-Supervised Point Cloud Learning

Learning Affinity-Aware Upsampling for Deep Image Matting

Data-driven Upsampling of Point Clouds

SIERRA: A robust bilateral feature upsampler for dense prediction

UDAformer: Underwater image enhancement based on dual attention transformer

Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation

Density-imbalance-eased LiDAR Point Cloud Upsampling via Feature Consistency Learning

Self-Attentive Pooling for Efficient Deep Learning

Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation

CARAFE: Content-Aware ReAssembly of FEatures