Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language.

Kai Niu,Yan Huang,Liang Wang
DOI: https://doi.org/10.1109/iccvw.2019.00225
2019-01-01
Abstract:Person search by language is an important application in video surveillance. The existing huge visual-semantic discrepancy and the cross-domain difficulty of emerging pedestrian images with new identities while no language description for training in real life application make this problem non-trivial to be addressed. In this paper, we first propose a concise and effective framework for image-sentence alignment to deal with the visual-semantic discrepancy. Second, we innovatively fuse the two opposite directions, i.e., source to target and target to source, for cross-domain adaption. Extensive experiments have validated the significant superiority of the proposed method on both source domain and target domain, and we have obtained the state-of-the-art performance and won the 1st place in competition.
What problem does this paper attempt to address?