Abstract:We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD, containing 5,920 images from real-world complex scenarios at 4K-8K resolutions. All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets. Aiming at overcoming the contradiction between the sampling depth and the receptive field size in the past methods, we propose a novel one-stage framework for HR-SOD task using pyramid grafting mechanism. In general, transformer-based and CNN-based backbones are adopted to extract features from different resolution images independently and then these features are grafted from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different branches. Comprehensive experiments on UHRSD and widely-used SOD datasets demonstrate that our method can simultaneously locate salient object and preserve rich details, outperforming state-of-the-art methods. To verify the generalization ability of the proposed framework, we apply it to the camouflaged object detection (COD) task. Notably, our method performs superior to most state-of-the-art COD methods without bells and whistles.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges in high - resolution salient object detection (HRSOD). Specifically, the authors focus on the following issues: 1. **Lack of high - resolution datasets**: - Most of the existing salient object detection (SOD) datasets are of low resolution (less than 512×512 pixels), which results in the model being unable to obtain sufficient detail information when dealing with high - resolution inputs. - To make up for this deficiency, the authors constructed a large - scale high - resolution salient object detection dataset UHRSD, which contains 5,920 4K - 8K resolution images from real - complex scenes and has been finely labeled at the pixel level. 2. **Contradiction between sampling depth and receptive field size**: - In past SOD methods, there is a contradiction between the sampling depth and the receptive field size of the network. The traditional FPN (Feature Pyramid Network) can only extract features within a limited range and it is difficult to take into account both global semantics and rich details simultaneously. - For this reason, the authors proposed a new single - stage framework to solve this problem through the pyramid grafting mechanism (Pyramid Grafting Mechanism). This framework combines the advantages of Transformer and CNN, independently extracts features from images of different resolutions, and grafts these features from the Transformer branch to the CNN branch. 3. **Computational burden brought by high - resolution inputs**: - Processing high - resolution images will bring a huge computational burden. Directly inputting high - resolution images into existing SOD models will lead to slower inference speed and it is difficult to recover the lost details. - The authors designed an asymmetric feature extraction strategy, using a lightweight CNN to capture the spatial features of large inputs, and using a Transformer to capture the context features of regular inputs, thereby optimizing the computational burden and forming a complementary effect. 4. **Cross - model feature grafting and attention - guided loss**: - In order to better graft heterogeneous features, the authors proposed a cross - model grafting module (Cross - Model Grafting Module, CMGM) based on the attention mechanism, and further guided the grafting process through the attention - guided loss (Attention Guided Loss, AGL), enabling the network to better interact with features from different branches. In summary, the main goal of this paper is to solve the challenges encountered by existing SOD methods when dealing with high - resolution inputs by constructing a high - quality high - resolution dataset and proposing an innovative network architecture, thereby achieving more accurate and efficient salient object detection.

PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network

Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection

Looking for the Detail and Context Devils: High-Resolution Salient Object Detection

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

Disentangled High Quality Salient Object Detection

Dual-path Processing Network for High-resolution Salient Object Detection

High-Resolution Network with Transformer Embedding Parallel Detection for Small Object Detection in Optical Remote Sensing Images

CRNet: Channel-Enhanced Remodeling-Based Network for Salient Object Detection in Optical Remote Sensing Images

Multilevel Interactive Reverse-Guided Network for Salient Object Detection in Optical Remote Sensing Images

Cross-modal refined adjacent-guided network for RGB-D salient object detection

CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection

HFENet: Hybrid feature encoder network for detecting salient objects in RGB-thermal images

Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images

Go Closer to See Better: Camouflaged Object Detection via Object Area Amplification and Figure-Ground Conversion

HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection

A fast self-attention cascaded network for object detection in large scene remote sensing images

Salient Object Detection Via Multi-Scale Neural Network.

A Unified Structure for Efficient RGB and RGB-D Salient Object Detection

Transcending Pixels: Boosting Saliency Detection via Scene Understanding from Aerial Imagery

A parallel down-up fusion network for salient object detection in optical remote sensing images

CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection