Abstract:Natural image matting algorithms aim to predict the transparency map (alpha-matte) with the trimap guidance. However, the production of trimap often requires significant labor, which limits the widespread application of matting algorithms on a large scale. To address the issue, we propose Matte Anything (MatAny), an interactive natural image matting model that could produce high-quality alpha-matte with various simple hints. The key insight of MatAny is to generate pseudo trimap automatically with contour and transparency prediction. In our work, we leverage vision foundation models to enhance the performance of natural image matting. Specifically, we use the segment anything model to predict high-quality contour with user interaction and an open-vocabulary detector to predict the transparency of any object. Subsequently, a pre-trained image matting model generates alpha mattes with pseudo trimaps. MatAny is the interactive matting algorithm with the most supported interaction methods and the best performance to date. It consists of orthogonal vision models without any additional training. We evaluate the performance of MatAny against several current image matting algorithms. MatAny has 58.3% improvement on MSE and 40.6% improvement on SAD compared to the previous image matting methods with simple guidance, achieving new state-of-the-art (SOTA) performance. The source codes and pre-trained models are available at <a class="link-external link-https" href="https://github.com/hustvl/Matte-Anything" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the large amount of manual labor required for manually generating trimaps in Natural Image Matting. Traditional natural image matting algorithms rely on trimaps to guide the prediction of alpha - mattes, but the production of trimaps usually requires a large amount of labor, which limits the popularization of matting algorithms in large - scale applications. To solve this problem, the paper proposes an interactive natural image matting model named Matte Anything (MatAny), which can automatically generate high - quality alpha - mattes through simple user prompts (such as points, boxes, scribbles, and text). Specifically, the key innovation of MatAny lies in using visual foundation models to automatically generate pseudo - trimaps. In the paper, the Segment Anything Model (SAM) is utilized to predict high - quality contours, and an open - vocabulary detection model (such as Grounding DINO) is used to predict the transparency of any object. Subsequently, the pre - trained image matting model generates alpha mattes based on these pseudo - trimaps. This method not only reduces the need for manual annotation of trimaps but also improves the quality and efficiency of matting. The main contributions of the paper include: 1. Proposing Matte Anything (MatAny), which is a high - performance and simply interactive matting framework composed of decoupled visual models that do not require additional training. 2. Designing a trimap generation strategy based on visual foundation models, which can generate high - quality adaptive pseudo - trimaps with the least amount of user input. 3. Evaluating the performance and generalization ability of MatAny on four image matting datasets. The results show that MatAny can achieve excellent performance on a variety of datasets, especially having strong generalization ability and zero - shot performance for real - world images without fine - tuning. Through these innovations, MatAny significantly improves the performance and user experience of natural image matting while reducing labor costs.

Matte Anything: Interactive Natural Image Matting with Segment Anything Models

Matte anything: Interactive natural image matting with segment anything model

Matting Anything

PP-Matting: High-Accuracy Natural Image Matting

Highly Efficient Natural Image Matting

User-Guided Deep Human Image Matting Using Arbitrary Trimaps

Attention-guided Temporally Coherent Video Object Matting

Semantic Image Matting

Semantic Image Matting: General and Specific Semantics

Semantic-guided Automatic Natural Image Matting with Trimap Generation Network and Light-weight Non-local Attention

Deep Image Matting with Sparse User Interactions.

Towards Natural Image Matting in the Wild via Real-Scenario Prior

Disentangled Image Matting

Natural Image Matting via Guided Contextual Attention

Portrait Matting via Semantic and Detail Guidance.

Boosting General Trimap-free Matting in the Real-World Image

Multi-guided-based image matting via boundary detection

Lightweight Image Matting via Efficient Non-local Guidance.

Improved Image Matting Via Real-time User Clicks and Uncertainty Estimation

Training Matting Models without Alpha Labels

Salient Image Matting