Siamese transformer with hierarchical concept embedding for fine-grained image recognition

Yilin Lyu,Liping Jing,Jiaqi Wang,Mingzhe Guo,Xinyue Wang,Jian Yu
DOI: https://doi.org/10.1007/s11432-022-3586-y
2023-02-12
Science China Information Sciences
Abstract:Distinguishing the subtle differences among fine-grained images from subordinate concepts of a concept hierarchy is a challenging task. In this paper, we propose a Siamese transformer with hierarchical concept embedding (STrHCE), which contains two transformer subnetworks sharing all configurations, and each subnetwork is equipped with the hierarchical semantic information at different concept levels for fine-grained image embeddings. In particular, one subnetwork is for coarse-scale patches to learn the discriminative regions with the aid of the innate multi-head self-attention mechanism of the transformer. The other subnetwork is for finer-scale patches, which are adaptively sampled from the discriminative regions, to capture subtle yet discriminative visual cues and eliminate redundant information. STrHCE connects the two subnetworks through a score margin adjustor to enforce the most discriminative regions generating more confident predictions. Extensive experiments conducted on four commonly-used benchmark datasets, including CUB-200-2011, FGVC-Aircraft, Stanford Dogs, and NABirds, empirically demonstrate the superiority of the proposed STrHCE over state-of-the-art baselines.
computer science, information systems,engineering, electrical & electronic
What problem does this paper attempt to address?