CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification

Jiawei Liu,Zheng-Jun Zha,Hongtao Xie,Zhiwei Xiong,Yongdong Zhang
DOI: https://doi.org/10.1145/3240508.3240585
2018-01-01
Abstract:Person re-identification aims to identify the same pedestrian across non-overlapping camera views. Deep learning techniques have been applied for person re-identification recently, towards learning representation of pedestrian appearance. This paper presents a novel Contextual-Attentional Attribute-Appearance Network ($\rm CA^3Net$) for person re-identification. The $\rm CA^3Net$ simultaneously exploits the complementarity between semantic attributes and visual appearance, the semantic context among attributes, visual attention on attributes as well as spatial dependencies among body parts, leading to discriminative and robust pedestrian representation. Specifically, an attribute network within $\rm CA^3Net$ is designed with an Attention-LSTM module. It concentrates the network on latent image regions related to each attribute as well as exploits the semantic context among attributes by a LSTM module. An appearance network is developed to learn appearance features from the full body, horizontal and vertical body parts of pedestrians with spatial dependencies among body parts. The $\rm CA^3Net$ jointly learns the attribute and appearance features in a multi-task learning manner, generating comprehensive representation of pedestrians. Extensive experiments on two challenging benchmarks, i.e., Market-1501 and DukeMTMC-reID datasets, have demonstrated the effectiveness of the proposed approach.
What problem does this paper attempt to address?