Amodal Ground Truth and Completion in the Wild

Guanqi Zhan,Chuanxia Zheng,Weidi Xie,Andrew Zisserman
2024-04-30
Abstract:This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper primarily investigates the problem of **amodal image segmentation**, which involves predicting the complete segmentation mask of an object in an image, including both visible and occluded (hidden) parts. Specifically, the paper attempts to address the following two key issues: 1. **Generating Ground Truth for Amodal Segmentation in Real Scenes**: - In previous works, the ground truth for amodal segmentation is typically obtained through manual annotation, which leads to subjectivity and inconsistency. - This paper proposes an automatic pipeline based on 3D data to generate true amodal masks for partially occluded objects in real images. 2. **Handling Amodal Completion Tasks in the Presence of Unknown Occluders**: - Existing methods usually require the occluder's mask to be provided, but in practical applications, the occluder's mask is often unavailable or hard to define. - This paper explores two architectural variants, a one-stage model and a two-stage model, that can perform amodal segmentation without providing the occluder's mask. ### Main Contributions 1. **Constructing a New Benchmark Dataset for Amodal Evaluation**: - A new dataset named **MP3D-Amodal** is proposed, containing a large number of real images and various categories of objects, each with true amodal masks. - The dataset is generated by projecting the 3D structure of the scene onto the image, ensuring the accuracy of the ground truth. 2. **Developing New Models for Amodal Completion**: - **Two-Stage Model (OccAmodal)**: First predicts the occluder's mask, then uses the predicted occluder's mask to complete the amodal segmentation. - **One-Stage Model (SDAmodal)**: Utilizes the powerful representation capability of the pre-trained Stable Diffusion model to directly infer the amodal mask from the image and modal mask. ### Experimental Results - **Performance Improvement**: Experiments on multiple datasets show that the proposed methods achieve new state-of-the-art performance in amodal segmentation tasks, especially excelling in handling objects from different domains and categories. - **Robustness**: Even without providing the occluder's mask, the proposed methods can effectively complete the amodal segmentation task, demonstrating their potential in practical applications. ### Conclusion By leveraging real 3D data, this paper not only addresses the ground truth generation problem in amodal segmentation tasks but also proposes effective methods for amodal completion in the presence of unknown occluders. These contributions provide significant technical support for the application of amodal segmentation in real-world scenarios.