Abstract:Recent approaches attempt to adapt powerful interactive segmentation models, such as SAM, to interactive matting and fine-tune the models based on synthetic matting datasets. However, models trained on synthetic data fail to generalize to complex and occlusion scenes. We address this challenge by proposing a new matting dataset based on the COCO dataset, namely COCO-Matting. Specifically, the construction of our COCO-Matting includes accessory fusion and mask-to-matte, which selects real-world complex images from COCO and converts semantic segmentation masks to matting labels. The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios. Furthermore, existing SAM-based matting methods extract intermediate features and masks from a frozen SAM and only train a lightweight matting decoder by end-to-end matting losses, which do not fully exploit the potential of the pre-trained SAM. Thus, we propose SEMat which revamps the network architecture and training objectives. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes. For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information. Extensive experiments across seven diverse datasets demonstrate the superior performance of our method, proving its efficacy in interactive natural image matting. We open-source our code, models, and dataset at <a class="link-external link-https" href="https://github.com/XiaRho/SEMat" rel="external noopener nofollow">this https URL</a>.

Semantic Image Matting: General and Specific Semantics

Semantic Image Matting

Salient Image Matting

Semantic Human Matting

Portrait Matting via Semantic and Detail Guidance.

Disentangled Image Matting

Semantic-guided Automatic Natural Image Matting with Trimap Generation Network and Light-weight Non-local Attention

Towards Natural Image Matting in the Wild via Real-Scenario Prior

Coarse Semantic Guided Alpha Matting Via Simultaneous Foreground and Background Estimation

Robust Human Matting via Semantic Guidance

Matte Anything: Interactive Natural Image Matting with Segment Anything Models

PP-Matting: High-Accuracy Natural Image Matting

Cascaded Segmented Matting Network for Human Matting

Matting Anything

Boosting General Trimap-free Matting in the Real-World Image

Boosting Semantic Human Matting with Coarse Annotations.

Weakly Supervised Image Matting Via Patch Clustering

Matte anything: Interactive natural image matting with segment anything model

Attention-guided Temporally Coherent Video Object Matting

Smart Scribbles for Image Matting.

Hierarchical and Progressive Image Matting