Abstract:This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at <a class="link-external link-https" href="https://raysonlaroca.github.io/supp/lpr-train-on-test/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to reveal the issue of approximately duplicate images in the training and testing sets of widely used datasets in License Plate Recognition (LPR) research and to explore the impact of these issues on model performance evaluation. #### Main Findings: 1. **Existence of Approximately Duplicate Images**: The paper analyzes two of the most commonly used datasets—AOLP and CCPD—and finds that these datasets contain a large number of approximately duplicate images, which, although different, display the same license plate. 2. **Impact on Model Performance Evaluation**: By re-evaluating six popular Optical Character Recognition (OCR) models after removing approximately duplicate images, it is found that the performance of these models significantly declines. This indicates that traditional evaluation methods overestimate the generalization ability of models due to the inclusion of duplicate images. 3. **Changes in Model Rankings**: When models are re-evaluated after removing duplicate images, the rankings of some models change significantly. For example, on the AOLP dataset, the CNNG model drops from the best result to third place. #### Specific Results: - **AOLP Dataset**: - After removing duplicate images, the recognition rates of all models significantly decline, with error rates more than doubling. - Model rankings change, indicating that some models may have been overestimated in the past due to a high proportion of duplicate images. - **CCPD Dataset**: - After removing duplicate images, the average recognition rate drops from 80.3% to 77.6%, with the performance of some models like STAR-Net and TRBA declining more noticeably. - Although model rankings do not change much, the absolute number of errors increases significantly, with the highest performance gap representing over 8000 misrecognized license plates. #### Conclusion: The paper emphasizes the importance of approximately duplicate images in LPR research and suggests that future research should avoid using duplicate images in training and testing sets to ensure fairness and accuracy in model evaluation. Additionally, the paper points out that similar issues exist in other datasets and cautions researchers to be aware of cross-dataset duplicate image problems.

Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

A First Look at Dataset Bias in License Plate Recognition

RoLMA: A Practical Adversarial Attack Against Deep Learning-Based LPR Systems.

Spot evasion attacks: Adversarial examples for license plate recognition systems with convolutional neural networks

Do We Train on Test Data? Purging CIFAR of Near-Duplicates

Benchmarking Probabilistic Deep Learning Methods for License Plate Recognition

An End to End Recognition for License Plates Using Convolutional Neural Networks

A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

Towards Low-resource License Plate Recognition Via Feature Shuffling

Enhancing ALPR: a two stage YOLO model with data augmentation for improved accuracy and robustness

How many labeled license plates are needed?

Recognizing License Plates in Real-Time

A Robust Attentional Framework for License Plate Recognition in the Wild

Character Time-series Matching For Robust License Plate Recognition

ALP-Net: a segmentation-free approach for license plate recognition in unconstrained scenarios

Vehicle and License Plate Recognition with Novel Dataset for Toll Collection

Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation

Enhancement of license plate recognition performance using Xception with Mish activation function

A Single-Target License Plate Detection with Attention

Investigating the Effects of Image Correction Through Affine Transformations on Licence Plate Recognition

Part-Regularized Near-Duplicate Vehicle Re-Identification.