Abstract:A light field camera can reconstruct 3D scenes using captured multi-focus images that contain rich spatial geometric information, enhancing applications in stereoscopic photography, virtual reality, and robotic vision. In this work, a state-of-the-art salient object detection model for multi-focus light field images, called LFSamba, is introduced to emphasize four main insights: (a) Efficient feature extraction, where SAM is used to extract modality-aware discriminative features; (b) Inter-slice relation modeling, leveraging Mamba to capture long-range dependencies across multiple focal slices, thus extracting implicit depth cues; (c) Inter-modal relation modeling, utilizing Mamba to integrate all-focus and multi-focus images, enabling mutual enhancement; (d) Weakly supervised learning capability, developing a scribble annotation dataset from an existing pixel-level mask dataset, establishing the first scribble-supervised baseline for light field salient object <a class="link-external link-http" href="http://detection.https" rel="external noopener nofollow">this http URL</a>://github.com/liuzywen/LFScribble

What problem does this paper attempt to address?

This paper attempts to solve the problem of salient object detection in multi - focus light - field images. Specifically, the author proposes a new model named LFSamba, aiming to improve the detection effect of salient objects in multi - focus light - field images by combining SAM (Segment Anything Model) and Mamba. The following are the specific problems that this paper attempts to solve: 1. **Effective Feature Extraction**: - Multi - focus light - field images contain rich spatial geometric information, but how to extract this information efficiently is a challenge. The author uses SAM to extract modality - aware discriminative features to enhance the feature extraction ability. 2. **Inter - slice Relationship Modeling**: - Multi - focus images are composed of multiple focal - plane slices, and each slice is focused at different depth positions. In order to capture the long - range dependencies between these slices and extract the implicit depth cues, the author introduces the Mamba model. 3. **Cross - modal Relationship Modeling**: - In order to better fuse all focal - plane images and multi - focus images, the author designs a cross - modal Mamba model to achieve mutual enhancement between different modal features. 4. **Weakly - Supervised Learning Ability**: - Annotation is an important step for deep - learning models to learn the potential mapping from input to output. Existing methods usually require dense annotation, resulting in high labor costs. To solve this problem, the author constructs a sparsely - annotated dataset and develops a weakly - supervised learning method, thereby reducing the annotation cost. In summary, the LFSamba model solves the above problems in the following aspects: - Use SAM for efficient feature extraction. - Utilize the Mamba model to capture long - range dependencies in multi - focus images. - Design a cross - modal Mamba model to fuse features of different modalities. - Construct a sparsely - annotated dataset and adopt a weakly - supervised learning method to reduce the annotation workload. Through these improvements, LFSamba has achieved significant performance improvement in the task of salient object detection in multi - focus light - field images.

LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection

A Learning-Based Method Using Data Augmentation for Light Field Salient Object Detection

Salient Object Detection with High-Level Prior Based on Bayesian Fusion.

LRNet: lightweight attention-oriented residual fusion network for light field salient object detection

LFMamba: Light Field Image Super-Resolution with State Space Model

Light Field Salient Object Detection with Sparse Views via Complementary and Discriminative Interaction Network

LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

Light Field Saliency Detection with Dual Local Graph Learning andReciprocative Guidance

Rethinking Feature Mining for Light Field Salient Object Detection

Learning Synergistic Attention for Light Field Salient Object Detection

Light Field Saliency Detection with Dual Local Graph Learning and Reciprocative Guidance

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

Focal stack based light field salient object detection via 3D–2D convolution hybrid network

ARFNet: Attention-Oriented Refinement and Fusion Network for Light Field Salient Object Detection

Light Field Saliency Detection With Deep Convolutional Networks

Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection

Exploring Focus and Depth-Induced Saliency Detection for Light Field

Deep Coarse-to-Fine Dense Light Field Reconstruction With Flexible Sampling and Geometry-Aware Fusion

Spatial Attention-Guided Light Field Salient Object Detection Network with Implicit Neural Representation

Light field salient object detection: A review and benchmark

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection.