Abstract:Vehicle recognition technology is widely applied in automatic parking, traffic restrictions, and public security investigations, playing a significant role in the construction of intelligent transportation systems. Fine-grained vehicle recognition seeks to surpass conventional vehicle recognition by concentrating on more detailed sub-classifications. This task is more challenging due to the subtle inter-class differences and significant intra-class variations.Localization-classification subnetworks represent an efficacious approach frequently employed for this task, but previous research has typically relied on CNN deep feature maps for object localization, which suffer from the low resolution, leading to poor localization accuracy. The multi-layer feature fusion localization (MFFL) method proposed by us fuses the high-resolution feature map of the shallow layer of CNN with the deep feature map, and makes full use of the rich spatial information of the shallow feature map to achieve more precise object localization. In addition, traditional methods acquire local attention information through the design of complex models, frequently resulting in regional redundancy or information omission. To address this, we introduce an attention module that adaptively enhances the expressiveness of global features and generates global attention features. These global attention features are then integrated with object-level features and local attention cues to achieve a more comprehensive attention enhancement. Lastly, we devise a multi-branch model and employ the aforementioned object localization and attention enhancement methods for end-to-end training to make the multiple branches collaborate seamlessly to adequately extract fine-grained features. Extensive experiments conducted on the Stanford Cars dataset and the self-built Cars-126 dataset have demonstrated the effectiveness of our method, achieving a leading position among existing methods with 97.7% classification accuracy on the Stanford Cars dataset.

Multi-View Active Fine-Grained Recognition

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

CFVL: A Coarse-to-Fine Vehicle Localizer with Omnidirectional Perception Across Severe Appearance Variations

Progressive Multi-task Anti-Noise Learning and Distilling Frameworks for Fine-grained Vehicle Recognition

Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture

Multi-layer feature fusion and attention enhancement for fine-grained vehicle recognition research

Multi-Path Deep CNNs for Fine-Grained Car Recognition

Multi-View Vehicle Detection Based on Fusion Part Model With Active Learning

Multi-directional guidance network for fine-grained visual classification

Cross-modality Online Distillation for Multi-View Action Recognition

Fine-grained Traffic Video Vehicle Recognition Based Orientation Estimation and Temporal Information

Discriminative-region attention and orthogonal-view generation model for vehicle re-identification

CAM: A fine-grained vehicle model recognition method based on visual attention model

Fine-Grained Visual Classification with Efficient End-to-end Localization

Dynamic Perception Framework for Fine-Grained Recognition

Aggregating Global and Local Visual Representation for Vehicle Re-IDentification

Attentive fine-grained recognition for cross-domain few-shot classification

Attention-based Multi-scale ViT Fine-grained Visual Classification

Fine-grained Vehicle Recognition Using Lightweight Convolutional Neural Network with Combined Learning Strategy

Exploiting Multi-grain Ranking Constraints for Precisely Searching Visually-similar Vehicles