A Multi-Scale Spatial-Temporal Attention Model for Person Re-Identification in Videos

Wei Zhang,Xuanyu He,Xiaodong Yu,Weizhi Lu,Zhengjun Zha,Qi Tian
DOI: https://doi.org/10.1109/TIP.2019.2959653
IF: 10.6
2020-01-01
IEEE Transactions on Image Processing
Abstract:In this paper, we propose a novel deep neural network based attention model to learn the representative local regions from a video sequence for person re-identification. Specifically, we propose a multi-scale spatial-temporal attention (MSTA) model to measure the regions of each frame in different scales from the perspective of whole video sequence. Compared to traditional temporal attention models, MSTA focuses on exploiting the importance of local regions of each frame to the whole video representation in both spatial and temporal domains. A new training strategy is designed for the proposed model by incorporating the image-to-image mode with the video-to-video mode. Extensive experiments on benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art methods.
What problem does this paper attempt to address?