Vehicle Reidentification Based on Convolution and Vision Transformer Feature Fusion

Rui Gong,Xue Zhang,Jianan Pan,Jie Guo,Xiushan Nie
DOI: https://doi.org/10.1109/mmul.2024.3398189
IF: 3.4911
2024-07-13
IEEE Multimedia
Abstract:Currently, surveillance cameras are extensively employed in public security, and vehicle reidentification has emerged as a burgeoning research area in computer vision. Nevertheless, vehicle reidentification grapples with the challenges of low intraclass similarity and high interclass similarity. This study tackles these challenges by introducing a novel vehicle reidentification method that integrates convolution and vision transformer features. Specifically, channel-by-channel convolution is incorporated into the feedforward layer to bolster the extraction of local features. Concurrently, the information from the last layer's class token and other patches is fused to yield a comprehensive and rich featured representation. Experiments conducted on the VeRi776 and VehicleID datasets validate that the proposed method outperforms current state-of-the-art vehicle reidentification methods.
computer science, information systems, theory & methods, software engineering, hardware & architecture
What problem does this paper attempt to address?