A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

Haoxuan Ding,Qi Wang,Junyu Gao,Qiang Li
2024-08-11
Abstract:Traditional license plate detection and recognition models are often trained on closed datasets, limiting their ability to handle the diverse license plate formats across different regions. The emergence of large-scale pre-trained models has shown exceptional generalization capabilities, enabling few-shot and zero-shot learning. We propose OneShotLP, a training-free framework for video-based license plate detection and recognition, leveraging these advanced models. Starting with the license plate position in the first video frame, our method tracks this position across subsequent frames using a point tracking module, creating a trajectory of prompts. These prompts are input into a segmentation module that uses a promptable large segmentation model to generate local masks of the license plate regions. The segmented areas are then processed by multimodal large language models (MLLMs) for accurate license plate recognition. OneShotLP offers significant advantages, including the ability to function effectively without extensive training data and adaptability to various license plate styles. Experimental results on UFPR-ALPR and SSIG-SegPlate datasets demonstrate the superior accuracy of our approach compared to traditional methods. This highlights the potential of leveraging pre-trained models for diverse real-world applications in intelligent transportation systems. The code is available at <a class="link-external link-https" href="https://github.com/Dinghaoxuan/OneShotLP" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of traditional license plate detection and recognition models when dealing with diverse license plate formats in different regions. Specifically, traditional models are usually trained on closed datasets, which restricts their ability to handle cross - regional diverse license plate formats. In addition, different countries and regions need to develop specialized license plate recognition systems to meet traffic management requirements, which significantly increases the costs of data collection, annotation, and model training. To solve these problems, the paper proposes a training - free framework, OneShotLP, for license plate tracking and recognition in videos. This framework utilizes the strong generalization ability of pre - trained large - scale models (such as foundation models) to achieve tasks that can be completed with only one prompt. The following are the main contributions of this method: 1. **Propose a training - free license plate tracking and recognition framework**: OneShotLP only needs to mark the approximate location of the license plate in the first frame of the video, and then through the point - tracking module, the segmentation module, and the multi - modal large - language models (MLLMs), it realizes the continuous tracking and recognition of the license plate in the entire video sequence. 2. **Design a tracking and recognition pipeline for video license plate analysis**: This pipeline includes three modules - a tracking module, a segmentation module, and a recognition module. The tracking module is responsible for tracking the marked points starting from the first frame; the segmentation module generates a mask of the license plate area according to the tracking points; the recognition module uses the multi - modal large - language model to perform character recognition on the segmented license plate image. 3. **Combine foundation models for tracking, detection, and recognition**: These models can generalize from limited inputs, thus effectively handling complex and diverse traffic scenes without a large amount of retraining. Experimental results show that OneShotLP performs excellently in video license plate tracking and recognition tasks, has zero - sample learning ability, and ensures accuracy and robustness. In summary, this paper aims to build a general - purpose license plate analysis system by introducing pre - trained foundation models, in order to reduce the resource expenditures required for different types of license plate analysis and improve application efficiency.