Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

Chao Li,Lei Liu,Kai Lv,Hao Sheng,Wei Ke
DOI: https://doi.org/10.1007/978-3-319-99365-2_16
2018-01-01
Abstract:Video-based person re-identification aims to precisely match video sequences of pedestrian across non-overlapped cameras. Existing methods deal with this task by encoding each frame and aggregating them along time. In order to increase the discriminative ability of video features, we propose an end-to-end framework called Multi-grained Fusion Network (MGFN) which aims to keep both global and local information by combining frame-level representations with different granularities. The final video features are generated by aggregating multi-grained representations on both spatial and temporal. Experiments indicate our method achieves excellent performance on three widely used datasets named PRID-2011, iLIDS-VID, and MARS. Especially on MARS, MGFN surpass state-of-the-art result by \(11.5\%\).
What problem does this paper attempt to address?