Matting by Generation

Zhixiang Wang,Baiang Li,Jian Wang,Yu-Lun Liu,Jinwei Gu,Yung-Yu Chuang,Shin'ichi Satoh
DOI: https://doi.org/10.1145/3641519.3657519
2024-07-31
Abstract:This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior resolution and detail. The proposed method is versatile and can perform both guidance-free and guidance-based image matting, accommodating a variety of additional cues. Our comprehensive evaluation across three benchmark datasets demonstrates the superior performance of our approach, both quantitatively and qualitatively. The results not only reflect our method's robust effectiveness but also highlight its ability to generate visually compelling mattes that approach photorealistic quality. The project page for this paper is available at <a class="link-external link-https" href="https://lightchaserx.github.io/matting-by-generation/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issue of image matting, specifically by improving image matting techniques through transforming the traditional regression problem into a generative modeling problem. The main problems the paper attempts to solve are as follows: 1. **Dependence on Additional Inputs in Traditional Methods**: - Existing image matting methods typically require additional inputs to reduce uncertainty, such as user-annotated trimaps or rough segmentation masks. However, these additional inputs may be imprecise, leading to a decline in the final matting quality. 2. **Poor Handling of Boundary Details**: - Current methods often face challenges in handling boundary regions, as these areas usually have low contrast, low image quality, and imperfect manual annotations, resulting in unnatural composite effects. 3. **Limitations of End-to-End Methods**: - Although some of the latest end-to-end methods attempt to address these issues by eliminating the need for additional inputs, they still face challenges in terms of insufficient training data and generating high-quality boundary details. The paper proposes a new method based on a generative diffusion model, leveraging pre-trained knowledge to better handle image semantics and details, thereby improving the quality of image matting. This method can handle high-resolution inputs and generate high-quality matting results without additional guidance, while also being flexible enough to incorporate various forms of guidance information (such as trimaps, rough masks, text prompts, etc.).