Unleashing the Feature Hierachy Potential: an Efficient Tri-Hybrid Person Search Model

Xi Yang,Menghui Tian,Nannan Wang,Xinbo Gao
DOI: https://doi.org/10.1109/tcsvt.2024.3424261
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Person search aims to locate target pedestrians from scene images, involving detection and re-identification. The former seeks to separate the background and focus on the commonality between pedestrians, while the latter aims to identify the target and focus on the difference between pedestrians. To address the paradox of detection and re-identification in search tasks, we propose an efficient Tri-Hybrid person search model utilizing the feature hierarchy design. Our model introduces three feature hybrid models for various feature levels. Before the RoI-Align, we present “Spatial-Channel Hybri” (SCH) and “Token-Channel Hybrid” (TCH). SCH perceives the boundary frame of pedestrians at multiple scales, thereby enhancing the information disparity between pedestrians and the background and refining the accuracy of the detection frame. TCH uses multi-layer perceptrons (MLP) and blends token and channel features, emphasizing detecting fine-grained semantic information for pedestrians. The interaction of multi-scale perception and fine-grained semantic information enhances the details of detected pedestrians, making them more suitable for similarity measurement in pedestrian matching. After the RoI-Align, we design the “CNN-Transformer Hybrid” to amalgamate global and local features to extract more comprehensive detailed features. Extensive experimental results on CUHK-SYSU and PRW demonstrate the effectiveness of the proposed method over the state-of-the-art performance. Specifically, our method achieves comparable performance on two benchmark datasets, CUHK-SYSU and PRW, with mAP scores of 94.62% and 57.84%, respectively.
What problem does this paper attempt to address?