Mask Encoding: A General Instance Mask Representation for Object Segmentation

Rufeng Zhang,Tao Kong,Xinlong Wang,Mingyu You
DOI: https://doi.org/10.1016/j.patcog.2021.108505
IF: 8
2021-01-01
Pattern Recognition
Abstract:Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?