Generalizable Prompts Guided by Image-Redundant Separation for Vehicle Re-identification

Zhenyu Kuang,Lidong Cheng,Hongyang Zhang,Yue Huang,Xinghao Ding
DOI: https://doi.org/10.1109/jiot.2024.3471673
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Vehicle re-identification (reID) is a critical computer vision task with applications in video surveillance and autonomous vehicles. While significant progress has been made in recent years, domain generalization (DG) in reID remains a challenging and valuable research direction. Learning discriminative features that capture the intrinsic characteristics of vehicles, rather than domain-specific details, is paramount in addressing the domain shift problem, which encompasses disparities in data distribution, feature distribution, and label distribution. Recently, Contrastive Language Image Pretraining (CLIP) has attracted widespread attention because of its capacity to generalize knowledge across different domains or contexts. When fine-tuned for DG tasks, it can leverage this broad knowledge to perform well in domains or on tasks it hasn’t specifically seen during training. The foremost work in this context is CLIP-reID, showcasing outstanding experimental performance on vehicle datasets through the integration of learnable prompts. However, the process of acquiring learnable prompts inevitably incorporates noisy text descriptions, such as background and camera style information, resulting in its limitations in domain generalization tasks. To address this distinctive issue, we propose a CLIP-based Image-Redundant Separation framework (CIRS) to remove redundant domain-specific information and then implement visual-text alignment of CLIP. Specifically, we employ a classic variational autoencoder for image reconstruction, which can encourage the images generated by the vector quantised-variational autoencoder (VQ-VAE) network to contain features unrelated to vehicle IDs. Under the precise guidance of the image-redundant separation framework, a set of generalizable and learnable prompts for each vehicle can be effectively generated for reID. Extensive experimental results indicate that our method has achieved remarkable performance on several public datasets.
What problem does this paper attempt to address?