Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning

Wei Shi,Hong Liu,Mengyuan Liu
DOI: https://doi.org/10.1016/j.patcog.2021.108314
IF: 8
2022-01-01
Pattern Recognition
Abstract:•A deep image-to-video person re-identification pipeline with two modules is proposed to learn fine-grained and temporal invariant features.•To address the appearance misalignment, a 3D-SAA module is designed to semantically align different human body parts in the 3D surface space.•To address the modality misalignment, a CMIL module is developed to fuse two modalities with an interactive similarity comparison mechanism.•A multi-branch aggregation network in 3D-SAA module is designed to weaken the influence of negligible body parts and backgrounds.
What problem does this paper attempt to address?