Sketchformer++: A Hierarchical Transformer Architecture for Vector Sketch Representation

Pengfei Xu,Banhuai Ruan,Youyi Zheng,Hui Huang
DOI: https://doi.org/10.1007/978-981-97-2095-8_2
2024-01-01
Abstract:With the rising ubiquity of digital touch devices and sketch-based interfaces, freehand sketching has become an essential mode of visual communication. Nevertheless, interpreting these often ambiguous and sparse sketches poses challenges for computers. This paper presents Sketchformer++, a hierarchical transformer architecture for the neural representation of vector sketches. It treats a vector sketch as a three-level structure, i.e., sketch level, stroke level, and segment level. Three self-attention modules are adopted in the network architecture, corresponding to the sketch hierarchy. The semantics of sketches are aggregated from local to global, resulting in neural representations of sketches. Extensive experiments show that Sketchformer++ exhibits superior performance in various downstream tasks, including sketch reconstruction, sketch recognition, and sketch semantic segmentation, demonstrating its robustness and effectiveness in sketch representation.
What problem does this paper attempt to address?