Abstract:Recently, with the advance of deep Convolutional Neural Networks (CNNs), person Re-Identification (Re-ID) has witnessed great success in various applications. However, with limited receptive fields of CNNs, it is still challenging to extract discriminative representations in a global view for persons under non-overlapped cameras. Meanwhile, Transformers demonstrate strong abilities of modeling long-range dependencies for spatial and sequential data. In this work, we take advantages of both CNNs and Transformers, and propose a novel learning framework named Hierarchical Aggregation Transformer (HAT) for image-based person Re-ID with high performance. To achieve this goal, we first propose a Deeply Supervised Aggregation (DSA) to recurrently aggregate hierarchical features from CNN backbones. With multi-granularity supervisions, the DSA can enhance multi-scale features for person retrieval, which is very different from previous methods. Then, we introduce a Transformer-based Feature Calibration (TFC) to integrate low-level detail information as the global prior for high-level semantic information. The proposed TFC is inserted to each level of hierarchical features, resulting in great performance improvements. To our best knowledge, this work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID. Comprehensive experiments on four large-scale Re-ID benchmarks demonstrate that our method shows better results than several state-of-the-art methods. The code is released at <a class="link-external link-https" href="https://github.com/AI-Zhpp/HAT" rel="external noopener nofollow">this https URL</a>.

Learning Convolutional Multi-Level Transformers for Image-Based Person Re-Identification

Person Re-identification Based on Transform Algorithm

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

RETRACTED CHAPTER: Person Re-identification Based on Transform Algorithm

Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification

Learning transformer-based attention region with multiple scales for occluded person re-identification

HAT: Hierarchical Aggregation Transformers for Person Re-identification

Multi-Scale Transformer-Based Matching Network for Generalizable Person Re-Identification

Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification

Heterogeneous feature-aware Transformer-CNN coupling network for person re-identification

Tensor Multi-task Learning for Person Re-identification

Multi-Scale Triplet CNN for Person Re-Identification.

Person Re-identification by Deep Learning Multi-scale Representations

Discriminative Spatial Feature Learning for Person Re-Identification

Learning the Meta Feature Transformer for Unsupervised Person Re-Identification

Multi-scale Deep Learning Architectures for Person Re-identification

Transformer-based Contrastive Learning for Unsupervised Person Re-Identification

Large-scale Person Re-Identification As Retrieval.

TransReID: Transformer-based Object Re-Identification

Contextual Multi-Scale Feature Learning for Person Re-Identification

Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification