Abstract:Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, resulting in intra-category inconsistency due to disturbed high-frequency features. Additionally, blurred boundaries in fused features lack accurate high frequency, leading to boundary displacement. Building upon these observations, we propose Frequency-Aware Feature Fusion (FreqFusion), integrating an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator. The ALPF generator predicts spatially-variant low-pass filters to attenuate high-frequency components within objects, reducing intra-class inconsistency during upsampling. The offset generator refines large inconsistent features and thin boundaries by replacing inconsistent features with more consistent ones through resampling, while the AHPF generator enhances high-frequency detailed boundary information lost during downsampling. Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries. Extensive experiments across various dense prediction tasks confirm its effectiveness. The code is made publicly available at <a class="link-external link-https" href="https://github.com/Linwei-Chen/FreqFusion" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address two major issues in dense image prediction tasks: **intra-category inconsistency** and **boundary displacement**. Specifically: 1. **Intra-category Inconsistency**: During feature fusion, different parts within the same category may exhibit significant variations in feature values, leading to reduced intra-category feature consistency. For example, the wheels of a car may have more texture and darkness, while the windows appear smooth and bright. Standard feature fusion methods cannot effectively handle these inconsistent features, and simple bilinear upsampling may even exacerbate the problem, as an inconsistent feature may be upsampled to multiple pixels, further increasing intra-category inconsistency. 2. **Boundary Displacement**: During feature fusion, high-frequency information at the boundaries is often lost, leading to blurred boundaries and consequently boundary displacement. Previous studies have shown that simple interpolation methods tend to overly smooth features, resulting in the loss of boundary information. To address these issues, the authors propose the **Frequency-Aware Feature Fusion (FreqFusion)** method. This method enhances the feature fusion process through three key components: - **Adaptive Low-Pass Filter Generator (ALPF)**: Predicts spatially varying low-pass filters to reduce high-frequency components within objects, thereby reducing intra-category inconsistency. - **Offset Generator**: Refines large areas and fine boundaries by resampling to replace inconsistent features. - **Adaptive High-Pass Filter Generator (AHPF)**: Extracts high-frequency details lost during downsampling from low-level features, thereby enhancing boundary information. Through the collaborative work of these components, FreqFusion can restore high-quality fused features with consistent category information and clear boundaries. Experimental results show that FreqFusion significantly improves performance in various dense prediction tasks, including semantic segmentation, object detection, instance segmentation, and panoptic segmentation.

Frequency-aware Feature Fusion for Dense Image Prediction

Frequency-aware Feature Fusion for Dense Image Prediction

FAFusion: Learning for Infrared and Visible Image Fusion via Frequency Awareness

Attention-based Fusion Factor in FPN for Object Detection

Exploring Spatial Frequency Information for Enhanced Video Prediction Quality

Frequency Integration and Spatial Compensation Network for Infrared and Visible Image Fusion

TF(2)AN: A Temporal-Frequency Fusion Attention Network for Spectrum Energy Level Prediction

SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion

Pansharpening via Frequency-Aware Fusion Network With Explicit Similarity Constraints

An efficient frequency domain fusion network of infrared and visible images

A General Spatial-Frequency Learning Framework for Multimodal Image Fusion

Adaptive-SFSDAF for Spatiotemporal Image Fusion that Selectively Uses Class Abundance Change Information

DHFNet: Decoupled Hierarchical Fusion Network for RGB-T dense prediction tasks

Infrared and Visible Image Fusion Based on Adaptive Feature Enhancement and Generator Path Interaction

Image Fusion Based on Feature Decoupling and Proportion Preserving.

Deepfake Detection Based on the Adaptive Fusion of Spatial‐Frequency Features

AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention

SFCFusion: Spatial–Frequency Collaborative Infrared and Visible Image Fusion

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

Adaptive multi-feature fusion via cross-entropy normalization for effective image retrieval