Abstract:The traditional object (person) retrieval (re-identification) task aims to learn a discriminative feature representation with intra-similarity and inter-dissimilarity, which supposes that the objects in an image are manually or automatically pre-cropped exactly. However, in many real-world searching scenarios (e.g., video surveillance), the objects (e.g., persons, vehicles, etc.) are seldom accurately detected or annotated. Therefore, object-level retrieval becomes intractable without bounding-box annotation, which leads to a new but challenging topic, i.e., image-level search with multi-task integration of joint detection and retrieval. In this paper, to address the image search issue, we first introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A Siamese architecture and an on-line pairing strategy for similar and dissimilar objects in the given images are designed. Benefited by the Siamese structure, I-Net learns the shared feature representation, because, on which, both object detection and classification tasks are handled. 2) A novel on-line pairing (OLP) loss is introduced with a dynamic feature dictionary, which alleviates the multi-task training stagnation problem, by automatically generating a number of negative pairs to restrict the positives. 3) A hard example priority (HEP) based softmax loss is proposed to improve the robustness of classification task by selecting hard categories. The shared feature representation of I-Net may restrict the task-specific flexibility and learning capability between detection and retrieval tasks. Therefore, with the philosophy of divide and conquer, we further propose an improved I-Net, called DC-I-Net, which makes two new contributions: 1) two modules are tailored to handle different tasks separately in the integrated framework, such that the task specification is guaranteed. 2) A class-center gu-ded HEP loss (C<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="2.343ex" style="vertical-align: -0.171ex;" viewBox="0 -934.9 453.9 1008.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="513"></use></g></svg></span>2HEP) by exploiting the stored class centers is proposed, such that the intra-similarity and inter-dissimilarity can be captured for ultimate retrieval. Extensive experiments on famous image-level search oriented benchmark datasets, such as CUHK-SYSU dataset and PRW dataset for person search and the large-scale WebTattoo dataset for tattoo search, demonstrate that the proposed DC-I-Net outperforms the state-of-the-art tasks-integrated and tasks-separated image search models.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path></defs></svg>

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Joint Uneven Channel Information Network with Blend Metric Loss for Person Re-Identification

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Sequential End-to-end Network for Efficient Person Search

Bi-Directional Interaction Network for Person Search

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search

A Multi-task Joint Framework for Real-time Person Search

Codedretrieval: Joint Image Compression and Retrieval with Neural Networks.

Multilevel Interactive Enhanced Network for Infrared Small-Target Detection

DMRNet++: Learning Discriminative Features with Decoupled Networks and Enriched Pairs for One-Step Person Search

Towards Fully Decoupled End-to-End Person Search

Joint discriminative representation learning for end-to-end person search

A Task-Balanced Multiscale Adaptive Fusion Network for Object Detection in Remote Sensing Images

Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction

IMD-Net: Interpretable multi-scale detection network for infrared dim and small objects

A Multi-Task Framework for Infrared Small Target Detection and Segmentation

An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection

Task-decoupled interactive embedding network for object detection

Person Re-identification Meets Image Search

Inception Convolution and Feature Fusion for Person Search