Abstract:Person search is a time-consuming computer vision task that entails locating and recognizing query people in scenic pictures. Body components are commonly mismatched during matching due to position variation, occlusions, and partially absent body parts, resulting in unsatisfactory person search results. Existing approaches for extracting local characteristics of the human body using keypoint information are unable to handle the search job when distinct body parts are misaligned, ignoring to exploit multiple granularities, which is crucial in the person search process. Moreover, the alignment learning methods learn body part features with fixed and equal weights, ignoring the beneficial contextual information, e.g., the umbrella carried by the pedestrian, which supplements compelling clues for identifying the person. In this paper, we propose a Coarse-to-Fine Adaptive Alignment Representation (CFA 2 R) network for learning multiple granular features in misaligned person search in the coarse-to-fine perspective. To exploit more beneficial body parts and related context of the cropped pedestrians, we design a Part-Attentional Progressive Module (PAPM) to guide the network to focus on informative body parts and positive accessorial regions. Besides, we propose a Re-weighting Alignment Module (RAM) shedding light on more contributive parts instead of treating them equally. Specifically, adaptive re-weighted but not fixed part features are reconstructed by Re-weighting Reconstruction module, considering that different parts serve unequally during image matching. Extensive experiments conducted on CUHK-SYSU and PRW datasets demonstrate competitive performance of our proposed method.

Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language.

Privacy-Preserving and Cross-Domain Human Sensing by Federated Domain Adaptation with Semantic Knowledge Correction

Cross-modal Co-occurrence Attributes Alignments for Person Search by Language

Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos

Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Adaptive and Collaborative Multi-scale Alignment for Text-Based Person Search

Fine-Granularity Alignment for Text-Based Person Retrieval Via Semantics-Centric Visual Division

Attentive Feature Focusing for Person Search by Natural Language

A cross-view intelligent person search method based on multi-feature constraints

Graph-Based Local Feature Adaptation for Cross-Domain Person Re-Identification

Textual Dependency Embedding for Person Search by Language

Domain Adaptation for Semantic Segmentation of Road Scenes Via Two-Stage Alignment of Traffic Elements

Person Search by Multi-Scale Matching

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Cross-Domain Person Re-Identification Based on Feature Fusion

Cross-Modality Domain Adaptation for Freespace Detection: A Simple Yet Effective Baseline

Cross-Database Facial Expression Recognition via Unsupervised Domain Adaptive Dictionary Learning.

Comprehensive Attribute Prediction Learning for Person Search by Language.

Double-domain Adaptation Semantics for Retrieval-based Long-term Visual Localization