Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification.

Junting Yang,Zuliu Yang,Jing Zhou,Yong Zhao,Qifei Dai,Fuchi Li
DOI: https://doi.org/10.1145/3461353.3461357
2021-01-01
Abstract:In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.
What problem does this paper attempt to address?