StructToken : Rethinking Semantic Segmentation with Structural Prior

Fangjian Lin,Zhanhao Liang,Junjun He,Miao Zheng,Shengwei Tian,Kai Chen
DOI: https://doi.org/10.48550/arxiv.2203.12612
2022-01-01
Abstract:. In this paper, we present structure token (StructToken), a new paradigm for semantic segmentation. From a perspective on semantic segmentation as per-pixel classification, the previous deep-learning-based methods learn the per-pixel representation first through an encoder and a decoder head and then classify each pixel representation to a specific category to obtain the semantic masks. Differently, we propose a structure-aware algorithm that takes structural information as prior to construct semantic masks directly without per-pixel classification. Specif-ically, given an input image, the learnable structure token interacts with the image representations to reason the final semantic masks. Three interaction approaches are explored and the results not only outperform the state-of-the-art methods but also contain more structural information. Experiments are conducted on three widely used datasets including ADE20k, Cityscapes, and COCO-Stuff 10K. We hope that structure token could serve as an alternative for semantic segmentation and inspire future research.
What problem does this paper attempt to address?