From Composited to Real-world: Transformer-based Natural Image Matting

Yanfeng Wang,Lv Tang,Yijie Zhong,Bo Li
DOI: https://doi.org/10.1109/tcsvt.2023.3300731
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The task of image matting is an active research area in computer vision, and various trimap-free methods have been proposed to improve its performance. However, these methods do not consider the gap between composited and real-world images, resulting in limited generalization ability. To address this issue, we propose a domain alignment (DA) module that consists of local region-wise alignment (LRA) and global harmonious alignment (GHA). The LRA aligns the most diverse pixels in the transparent regions of the foreground between composited and real images. On the other hand, the GHA aligns the global image harmonization for both composited and real images, which helps the network choose the appropriate semantics for real harmonious images. Additionally, we design a transformer-based network with dynamic attention pruning (DAP) mechanism to accurately locate domain-sensitive regions, allowing the DA module to work more effectively. Furthermore, we introduce a new dataset, the Real-world Matting Dataset (RM-1k), to advance the real-world matting task. Our proposed method is evaluated on two composited benchmarks (Composite-1k and Distinctions-646) and two real-world datasets (AIM-500 and RM-1k), and the results show that our method achieves robust performance on both composited and real-world images.
engineering, electrical & electronic
What problem does this paper attempt to address?