Abstract:Person re-identification (Re-ID) aims to search for a target person through non-overlapping cameras. With the rapid development of computing and storage capacity of edge sensors, performing person Re-Idon edge devices has become more and more popular in recent years. Since the raw data recorded by edge devices does not have to be transmitted to the server directly, this scenario greatly improves data privacy and security. Edge-based person Re-ID also reduces the computation and transmission pressure of central servers. In this paper, we take the first step in performing video-based person Re-ID on edge devices with limited computing and storage resources. To deal with person tracklets extracted from video recordings, we design EdgeVPR, a novel lightweight real-time video person Re-ID model based on Transformer architecture. We use multi-level knowledge distillation to learn lightweight models from server-side large models. For the lightweight model, we propose a multi-scale spatio-temporal attention module (MSTA) to replace the original multi-head self-attention (MSA) layers in Transformer. Our MSTA module can not only capture both spatial and temporal information from tracklets but also greatly reduces the computation compared with MSA layers. To deal with the challenge caused by occlusion or mis-classification in generating person tracklets, we perform patch transformation during the teacher model training process and use contrastive learning methods to enhance the model's robustness. A pluggable environment adapter is designed for the lightweight student model environment-oriented fine-tuning since edge sensors often face different shooting environments and angles. We perform experiments on MARS dataset [1] and DukeMTMC-VideoReID dataset [2]. Results show that EdgeVPR gets significantly better results compared with prior edge-based person Re-ID work.

Video Person Re-identification Based on Transformer-CNN Model

Person Re-identification Based on Transform Algorithm

A Loss Combination Based Deep Model for Person Re-Identification

RETRACTED CHAPTER: Person Re-identification Based on Transform Algorithm

Transformer-based Feature Interactor for Person Re-Identification with Margin Self-Punishment Loss

Person Re-identification Network Based on Multi-Level Feature Fusion

Tran-GCN: A Transformer-Enhanced Graph Convolutional Network for Person Re-Identification in Monitoring Videos

Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification

Multi-Level Fusion Temporal-Spatial Co-Attention for Video-Based Person Re-Identification

Heterogeneous feature-aware Transformer-CNN coupling network for person re-identification

Video-based person re-identification with complementary local and global features using a graph transformer

Learning transformer-based attention region with multiple scales for occluded person re-identification

TransReID: Transformer-based Object Re-Identification

HAT: Hierarchical Aggregation Transformers for Person Re-identification

A double transformer residual super-resolution network for cross-resolution person re-identification

Occluded pedestrian re-identification via Res-ViT double-branch hybrid network

EdgeVPR: Transformer-Based Real-Time Video Person Re-Identification at the Edge

Transformer-Based Feature Compensation Network for Aerial Photography Person and Ground Object Recognition

Improved Res2Net model for Person re-identification

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identificatio