Abstract:This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper primarily investigates the problem of **amodal image segmentation**, which involves predicting the complete segmentation mask of an object in an image, including both visible and occluded (hidden) parts. Specifically, the paper attempts to address the following two key issues: 1. **Generating Ground Truth for Amodal Segmentation in Real Scenes**: - In previous works, the ground truth for amodal segmentation is typically obtained through manual annotation, which leads to subjectivity and inconsistency. - This paper proposes an automatic pipeline based on 3D data to generate true amodal masks for partially occluded objects in real images. 2. **Handling Amodal Completion Tasks in the Presence of Unknown Occluders**: - Existing methods usually require the occluder's mask to be provided, but in practical applications, the occluder's mask is often unavailable or hard to define. - This paper explores two architectural variants, a one-stage model and a two-stage model, that can perform amodal segmentation without providing the occluder's mask. ### Main Contributions 1. **Constructing a New Benchmark Dataset for Amodal Evaluation**: - A new dataset named **MP3D-Amodal** is proposed, containing a large number of real images and various categories of objects, each with true amodal masks. - The dataset is generated by projecting the 3D structure of the scene onto the image, ensuring the accuracy of the ground truth. 2. **Developing New Models for Amodal Completion**: - **Two-Stage Model (OccAmodal)**: First predicts the occluder's mask, then uses the predicted occluder's mask to complete the amodal segmentation. - **One-Stage Model (SDAmodal)**: Utilizes the powerful representation capability of the pre-trained Stable Diffusion model to directly infer the amodal mask from the image and modal mask. ### Experimental Results - **Performance Improvement**: Experiments on multiple datasets show that the proposed methods achieve new state-of-the-art performance in amodal segmentation tasks, especially excelling in handling objects from different domains and categories. - **Robustness**: Even without providing the occluder's mask, the proposed methods can effectively complete the amodal segmentation task, demonstrating their potential in practical applications. ### Conclusion By leveraging real 3D data, this paper not only addresses the ground truth generation problem in amodal segmentation tasks but also proposes effective methods for amodal completion in the presence of unknown occluders. These contributions provide significant technical support for the application of amodal segmentation in real-world scenarios.

Amodal Ground Truth and Completion in the Wild

Adapting the Segment Anything Model for Multi-modal Retinal Anomaly Detection and Localization

Image Amodal Completion: A Survey

Open-World Amodal Appearance Completion

Amodal Layout Completion in Complex Outdoor Scenes.

Amodal segmentation just like doing a jigsaw

Amodal Segmentation Based on Visible Region Segmentation and Shape Prior

PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus

BLADE: Box-Level Supervised Amodal Segmentation through Directed Expansion

Sequential Amodal Segmentation via Cumulative Occlusion Learning

Amodal Depth Anything: Amodal Depth Estimation in the Wild

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

Coarse-to-Fine Amodal Segmentation with Shape Prior

Self-supervised Amodal Video Object Segmentation

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Amodal Instance Segmentation Via Prior-Guided Expansion.

Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

Using Diffusion Priors for Video Amodal Segmentation

Human De-occlusion: Invisible Perception and Recovery for Humans

MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation