Scene Sketch Semantic Segmentation with Hierarchical Transformer.

Jie Yang,Aihua Ke,Yaoxiang Yu,Bo Cai
DOI: https://doi.org/10.1016/j.knosys.2023.110962
IF: 8.139
2023-01-01
Knowledge-Based Systems
Abstract:Convolutional Neural Networks (CNNs) have always been the dominant method for scene sketch semantic segmentation, but their performance seems to have plateaued due to the limitation of local receptive fields. To address this problem, we propose SketchSeger, a hierarchical Transformer-based model for scene sketch semantic segmentation. Accurate scene sketch segmentation relies on both high-level semantics and low-level details. To obtain better segmentation performance, we designed an MLP-based feature fusion module for the model decoder to merge feature maps captured at different scales efficiently. Compared to CNN-based models, SketchSeger exhibits a stronger ability in contextual modeling and can obtain global receptive fields even in its shallow layers. Besides the model architecture, the absence of large-scale pre-training datasets also presents a significant challenge for advancing scene sketch semantic segmentation. To promote further research, we propose a novel hand-drawn style scene sketch synthesis method and use it to synthesize a dataset containing 300,000 annotated scene sketches. We conduct extensive experiments and visual analysis to validate the efficacy of our proposed SketchSeger model and dataset synthesis approach. The results show that SketchSeger significantly outperforms state-of-the-art models on three benchmark datasets (SketchyScene, SKY-Scene, and TUB-Scene) with similar parameter scales. Codes and datasets are available at https://github.com/jayangcs/SketchSeger.
What problem does this paper attempt to address?