Effective Local-Global Transformer for Natural Image Matting

Liangpeng Hu,Yating Kong,Jide Li,Xiaoqiang Li
DOI: https://doi.org/10.1109/tcsvt.2023.3234983
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Learning-based matting methods have been dominated by convolution neural networks for a long time. These methods mainly propagate the alpha matte according to the similarity between unknown and known regions. However, correlations between pixels in unknown and known regions are limited due to the insufficient receptive fields of common convolution neural networks, which leads to inaccurate estimation for pixels in unknown regions that are far away from known regions. In this paper, we propose an Effective Local-Global Transformer for natural image matting (ELGT-Matting), which can further expand receptive fields to establish a wide range of correlations between unknown and known regions. The kernel module is the effective local-global transformer block, and each block consists of two modules: 1) A Window-Level Global MSA (Multi-head Self-Attention) module, which learns global context features among windows. 2) A Local-Global Window MSA, which combines coarse global context features and corresponding fine local window features to help local window self-attention capture both local and context information. Experiments demonstrate that our ELGT-Matting performs outstandingly against other competitive approaches on Composition-1K, Distinctions-646, and real-world AIM-500 datasets. In particular, we achieve a new SOTA result on Composition-1K with MSE 0.00374.
engineering, electrical & electronic
What problem does this paper attempt to address?