Cross-Modal Pixel-and-Stroke representation aligning networks for free-hand sketch recognition
Yang Zhou,Jin Wang,Jingru Yang,Ping Ni,Guodong Lu,Heming Fang,Zhihui Li,Huan Yu,Kaixiang Huang
DOI: https://doi.org/10.1016/j.eswa.2023.122505
IF: 8.5
2024-04-01
Expert Systems with Applications
Abstract:We consider the cross-modal alignment problem for free-hand sketch. Given a sequence of stroke and a rasterized image, the objective is to enhance the performance of sketch recognition through cross-modal interactions. Existing works mostly employ simple weighted adding and concatenation for late fusion, or shallow attention layers for cross-modal alignment. Due to the high heterogeneity between sketch modalities, these methods do not capture meaningful feature representations sufficiently. In this paper, we propose a sketch recognition framework CMPS for aligning Cross-Modal Pixel-and-Stroke representation, which includes novel components, namely the Semantic-Temporal Alignment Rasterization (STAR) and Pixel-Stroke Alignment (PSA) module. STAR aligns stroke with image at the semantic and temporal levels during the rasterization preprocessing phase by utilizing color variations in the RGB space for sketch. PSA, through its pre-alignment and post-alignment, learns how to align semantic connections at both pixel and stroke levels, capturing cross-modal dependencies, rather than relying on shallow matrix operations for interaction. Additionally, we introduce a concise stroke processing network called StrokeFormer. It extracts two hierarchical features, i.e., point-level and stroke-level, based on the formation mechanism of sketch. StrokeFormer outperforms most RNN-based and CNN-based models by a significant margin. Our experimental results demonstrate that proposed CMPS achieves new state-of-the-art performance on the Google QuickDraw-414 K dataset and TU-Berlin dataset. The code is available at https://github.com/WoodratTradeCo/CMPS.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science