Abstract:Many industrial and service sectors require tools to extract vehicle characteristics from images. This is a complex task not only by the variety of noise, and large number of classes, but also by the constant introduction of new vehicle models to the market. In this paper, we present Veri-Car, an information retrieval integrated approach designed to help on this task. It leverages supervised learning techniques to accurately identify the make, type, model, year, color, and license plate of cars. The approach also addresses the challenge of handling open-world problems, where new car models and variations frequently emerge, by employing a sophisticated combination of pre-trained models, and a hierarchical multi-similarity loss. Veri-Car demonstrates robust performance, achieving high precision and accuracy in classifying both seen and unseen data. Additionally, it integrates an ensemble license plate detection, and an OCR model to extract license plate numbers with impressive accuracy.
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to accurately extract the characteristic information of vehicles from images, including brand, type, model, year, color, and license plate number, especially in the open - world environment (that is, new vehicle models are constantly introduced in the market), traditional classifiers have difficulty dealing with the problem of unseen data distribution. Specifically, the Veri - Car system aims to solve the following key challenges:
1. **Diversity and complexity**: The noise in vehicle images, the existence of multiple categories, and the continuous introduction of new vehicle models make the task complex.
2. **Open - world problem**: The frequent appearance of new vehicle models and variants causes traditional closed - set classifiers to perform poorly when processing unseen data.
3. **Multi - attribute recognition**: It is necessary to recognize multiple attributes simultaneously (such as brand, model, color, license plate, etc.), which increases the difficulty of the task.
To solve these problems, the Veri - Car system adopts the following methods:
- **Supervised learning techniques**: Accurately identify the brand, type, model, year, color, and license plate of vehicles through supervised learning.
- **Pre - trained models and metric learning**: Combine pre - trained models (such as CLIP and OpenCLIP) and metric learning methods to generate high - quality embedding representations to distinguish between seen and unseen data.
- **Hierarchical Multi - Similarity Loss (HiMS - Min)**: Used to capture the complex relationships between vehicle attributes and prevent major errors during the inference process.
- **Integrate license plate detection and OCR**: Use YOLOv5 for license plate detection and TrOCR for license plate character recognition, thereby achieving high - precision license plate information extraction.
In addition, Veri - Car also introduces a human - involved mechanism to process unseen data, ensuring that the system can be continuously updated and adapt to new vehicle models. Through these methods, Veri - Car shows strong performance in the open - world environment and can maintain high precision and accuracy when classifying seen and unseen data.
### Formula summary
1. **Multi - Similarity Loss**:
\[
L_{\text{MS}}=\frac{1}{B}\sum_{i = 1}^{B}\left[\frac{1}{\alpha}\log\left(1+\sum_{j\in P(i)}e^{-\alpha(s_{ij}-\lambda)}\right)+\frac{1}{\beta}\log\left(1+\sum_{k\in N(i)}e^{\beta(s_{ik}-\lambda)}\right)\right]
\]
where \(B\) is the batch size, \(P(i)\) and \(N(i)\) represent the positive and negative sample sets of the \(i\)-th anchor point respectively, \(s_{ij}\) and \(s_{ik}\) represent similarity scores, and \(\alpha\), \(\beta\) and \(\lambda\) are hyper - parameters that control loss scaling and boundaries.
2. **Hierarchical Multi - Similarity Loss (HiMS - Min)**:
\[
L_{\text{HiMS - Min}}=\sum_{L = l,\ldots,1}\frac{1}{|L|}\sum_{i = 1}^{B}\frac{\lambda_l}{|Z(i)|}\sum_{z_l\in Z(i)_l}\min(L_{\text{MS}}(i,z_i^l),L_{\text{min}}^{\text{MS}}(l + 1))
\]
where \(L\) is the number of hierarchical levels, \(B\) is the batch size, \(Z(i)_l\) represents the positive and negative sample sets of the \(l\)-th layer, \(L_{\text{M