ZIM: Zero-Shot Image Matting for Anything

Beomyoung Kim,Chanyong Shin,Joonhyun Jeong,Hyungsik Jung,Se-Yun Lee,Sewhan Chun,Dong-Hyun Hwang,Joonsang Yu

2024-11-01

Abstract:The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D NeRF. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks. The code is available at \url{<a class="link-external link-https" href="https://github.com/naver-ai/ZIM" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is the limitation of existing image segmentation models (such as the Segment Anything Model, SAM) in generating fine-grained masks. Although SAM performs well in zero-shot segmentation tasks, the masks it generates often lack fine-grained precision, especially when dealing with complex boundaries and details (such as hair strands). To overcome this issue, the authors propose a new Zero-Shot Image Matting (ZIM) model, aimed at generating high-quality fine-grained matting masks while maintaining zero-shot capability. Specifically, the main contributions of the paper include: 1. **Label Converter**: A label converter is developed to convert segmentation labels into detailed matting labels, thereby constructing a new large-scale fine-grained matting dataset (SA1B-Matte). By training SAM on this dataset, it can generate more precise matting masks while retaining its zero-shot capability. 2. **Zero-Shot Matting Model**: A zero-shot matting model is designed, introducing a hierarchical pixel decoder and a prompt-aware mask attention mechanism to enhance mask representation and performance. The hierarchical pixel decoder, through a multi-level feature pyramid design, improves the robustness and richness of mask feature representation. The prompt-aware mask attention mechanism enables the model to better focus on areas specified by visual prompts. 3. **New Test Set**: A new test set (MicroMat-3K) is introduced, containing 3000 high-quality fine-grained matting labels, used to evaluate the performance of the zero-shot matting model. Through these contributions, the paper provides a strong foundation for advancing zero-shot matting and its downstream applications, particularly in tasks requiring high-precision masks, such as image restoration and 3D NeRF.

ZIM: Zero-Shot Image Matting for Anything

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis

Towards Natural Image Matting in the Wild via Real-Scenario Prior

ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

Attention-guided Temporally Coherent Video Object Matting

SAM-I-Am: Semantic Boosting for Zero-shot Atomic-Scale Electron Micrograph Segmentation

Delving into Shape-aware Zero-shot Semantic Segmentation

Zero-shot Unsupervised Transfer Instance Segmentation

ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)

Deep Automatic Natural Image Matting

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images

Zero-shot performance of the Segment Anything Model (SAM) in 2D medical imaging: A comprehensive evaluation and practical guidelines

Maskomaly:Zero-Shot Mask Anomaly Segmentation

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning