EdgeVPR: Transformer-Based Real-Time Video Person Re-Identification at the Edge

Meng Sun,Ju Ren,Yaoxue Zhang
DOI: https://doi.org/10.1109/icdcs60910.2024.00011
2024-01-01
Abstract:Person re-identification (Re-ID) aims to search for a target person through non-overlapping cameras. With the rapid development of computing and storage capacity of edge sensors, performing person Re-Idon edge devices has become more and more popular in recent years. Since the raw data recorded by edge devices does not have to be transmitted to the server directly, this scenario greatly improves data privacy and security. Edge-based person Re-ID also reduces the computation and transmission pressure of central servers. In this paper, we take the first step in performing video-based person Re-ID on edge devices with limited computing and storage resources. To deal with person tracklets extracted from video recordings, we design EdgeVPR, a novel lightweight real-time video person Re-ID model based on Transformer architecture. We use multi-level knowledge distillation to learn lightweight models from server-side large models. For the lightweight model, we propose a multi-scale spatio-temporal attention module (MSTA) to replace the original multi-head self-attention (MSA) layers in Transformer. Our MSTA module can not only capture both spatial and temporal information from tracklets but also greatly reduces the computation compared with MSA layers. To deal with the challenge caused by occlusion or mis-classification in generating person tracklets, we perform patch transformation during the teacher model training process and use contrastive learning methods to enhance the model's robustness. A pluggable environment adapter is designed for the lightweight student model environment-oriented fine-tuning since edge sensors often face different shooting environments and angles. We perform experiments on MARS dataset [1] and DukeMTMC-VideoReID dataset [2]. Results show that EdgeVPR gets significantly better results compared with prior edge-based person Re-ID work.
What problem does this paper attempt to address?