Abstract:The objective of the occluded person re-identification (ReID) task is to capture the same person from different camera angles when the pedestrian’s body is partially occluded. In this task, there are two main challenges: 1) pedestrians are often occluded by other persons or objects, and 2) pedestrians change poses. Moreover, these two issues often simultaneously occur. Although many occluded person ReID algorithms have been proposed, many existing methods can often only solve one of these issues well, and the other issue is often ignored. In this work, a novel semantic perception and CNN-transformer hybrid network (abbreviated as SPH) is proposed for occluded person ReID, which consists of a CNN-based human semantic perception stream and a transformer-based pose perception stream. In the former, a human semantic auxiliary module and a human semantic perception module are designed to obtain human semantic information where multi-granularity region features of the human body are extracted to solve the issues of occlusion. In the latter, we propose a token-based pose integration module to obtain the corresponding patch for each pose key-point and the relative position information to solve the change in pedestrian pose. Moreover, these two streams are jointly optimized in a unified framework. In addition, to further solve the issue of occlusion, the human completion strategy is proposed for the query sample where the gallery samples are used to complete the missing parts of the query. Extensive experimental results on three public occluded person ReID datasets, Occluded-DukeMTMC, P-DukeMTMC-reID, and Occluded-REID, demonstrate that the proposed method can outperform all SOTA occluded person ReID methods in terms of the mAP and Rank-1. Compared with PAT (CVPR21) on the Occluded-DukeMTMC and Occluded-REID datasets, the improvements in mAP/Rank-1 reached 10.1%/7.4%, and 10%/1%, respectively. Moreover, when TransReID (ICCV21) was used, SPH achieved improvements of 4.5% (mAP) and 5.5% (Rank-1) on the Occluded-DukeMTMC dataset.

CNN Attention Enhanced ViT Network for Occluded Person Re-Identification

Person Re-identification Based on Transform Algorithm

RETRACTED CHAPTER: Person Re-identification Based on Transform Algorithm

Parallel Dense Vision Transformer and Augmentation Network for Occluded Person Re-identification.

Joint Convolutional and Self-Attention Network for Occluded Person Re-Identification.

Occluded pedestrian re-identification via Res-ViT double-branch hybrid network

Occluded person re-identification based on parallel triplet augmentation and parameter-free token spatial attention

AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification

Feature Refinement and Filter Network for Person Re-Identification

Convolutional and Transformer Fusion Network Based on Cross-Attention for Occluded Person Re-identification

Occlude Them All: Occlusion-Aware Attention Network for Occluded Person Re-ID

Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification

Recurrent Deep Attention Network for Person Re-Identification.

PersonViT: Large-scale Self-supervised Vision Transformer for Person Re-Identification

AMC-Net: Attentive Modality-Consistent Network for Visible-Infrared Person Re-Identification.

Learning transformer-based attention region with multiple scales for occluded person re-identification

Feature attention fusion network for occluded person re-identification

Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-identification

Dual-stream Transformer with Distribution Alignment for Visible-Infrared Person Re-Identification

A Semantic Perception and CNN-Transformer Hybrid Network for Occluded Person Re-identification

Enhanced visible–infrared person re-identification based on cross-attention multiscale residual vision transformer