Abstract:Interactive portrait matting refers to extracting the soft portrait from a given image that best meets the user's intent through their inputs. Existing methods often underperform in complex scenarios, mainly due to three factors. (1) Most works apply a tightly coupled network that directly predicts matting results, lacking interpretability and resulting in inadequate modeling. (2) Existing works are limited to a single type of user input, which is ineffective for intention understanding and also inefficient for user operation. (3) The multi-round characteristics have been under-explored, which is crucial for user interaction. To alleviate these limitations, we propose DFIMat, a decoupled framework that enables flexible interactive matting. Specifically, we first decouple the task into 2 sub-ones: localizing target instances by understanding scene semantics and the flexible user inputs, and conducting refinement for instance-level matting. We observe a clear performance gain from decoupling, as it makes sub-tasks easier to learn, and the flexible multi-type input further enhances both effectiveness and efficiency. DFIMat also considers the multi-round interaction property, where a contrastive reasoning module is designed to enhance cross-round refinement. Another limitation for multi-person matting task is the lack of training data. We address this by introducing a new synthetic data generation pipeline that can generate much more realistic samples than previous arts. A new large-scale dataset SMPMat is subsequently established. Experiments verify the significant superiority of DFIMat. With it, we also investigate the roles of different input types, providing valuable principles for users. Our code and dataset can be found at <a class="link-external link-https" href="https://github.com/JiaoSiyi/DFIMat" rel="external noopener nofollow">this https URL</a>.

Unifying Automatic and Interactive Matting with Pretrained ViTs

Exploring the Interactive Guidance for Unified and Effective Image Matting

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Attention-guided Temporally Coherent Video Object Matting

Towards Natural Image Matting in the Wild via Real-Scenario Prior

Deep Image Matting with Sparse User Interactions.

Deep Interactive Image Matting With Feature Propagation

Situational Perception Guided Image Matting

Easy Matting - A Stroke Based Approach for Continuous Image Matting

Deep Automatic Natural Image Matting

Weakly Supervised Image Matting Via Patch Clustering

Mask Guided Matting via Progressive Refinement Network

DFIMat: Decoupled Flexible Interactive Matting in Multi-Person Scenarios

Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting

Real-Time Multi-Person Video Synthesis with Controllable Prior-Guided Matting

Video Instance Matting

In-Context Matting

Matte Anything: Interactive Natural Image Matting with Segment Anything Models

Improved Image Matting Via Real-time User Clicks and Uncertainty Estimation

User-Guided Deep Human Image Matting Using Arbitrary Trimaps

Robust Human Matting via Semantic Guidance