TFRNet: Semantic Segmentation Network with Token Filtration and Refinement Method

Yingdong Ma,Xiaoyu Hu
DOI: https://doi.org/10.1109/tmm.2024.3378465
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Transformer-based semantic segmentation has been developed rapidly. Vision transformer (ViT) rely on self-attention mechanism which employs all image patches to compute long-range dependencies. ViT considers all tokens equally important for self-attention calculation. Nevertheless, it has been proved that image tokens contribute differently to the final prediction. In this paper, we propose a token filtration method to select informative tokens. These informative tokens are then used to reweight token sequence so that important image tokens can be focused by transformer for more accurate prediction. Meanwhile, due to lack of local information, transformer-based segmentation usually has incomplete object structure and coarse boundaries. To this end, a segmentation refinement method is introduced to refine transformer segmentation results. The refinement method integrates transformer outputs with convolutional features of the input image to generate refined prediction. Finally, we introduce the token filtration and refinement network (TFRNet) which adopts the proposed token filtration method and the refinement method to improve segmentation performance. We evaluate the proposed TFRNet on the ADE20K and Cityscapes datasets. Experimental results show that the proposed method outperforms other state-of-the-art approaches.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?