The Solution for Single Object Tracking Task of Perception Test Challenge 2024

Zhiqiang Zhong,Yang Yang,Fengqiang Wan,Henglu Wei,Xiangyang Ji
2024-10-19
Abstract:This report presents our method for Single Object Tracking (SOT), which aims to track a specified object throughout a video sequence. We employ the LoRAT method. The essence of the work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency, to the domain of visual tracking. We train our model using the extensive LaSOT and GOT-10k datasets, which provide a solid foundation for robust performance. Additionally, we implement the alpha-refine technique for post-processing the bounding box outputs. Although the alpha-refine method does not yield the anticipated results, our overall approach achieves a score of 0.813, securing first place in the competition.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance improvement and model optimization in the Single Object Tracking (SOT) task. Specifically, the research team hopes to track the specified object more accurately in video sequences by improving the structure and training methods of existing models. To achieve this goal, they adopted the following strategies: 1. **Application of LoRA technology**: LoRA (Low - Rank Adaptation) is a fine - tuning technology that reduces inference latency by adjusting only a small part of the model's parameters. The research team mentioned in the paper applied LoRA technology to the field of visual tracking to improve the adaptability and efficiency of the model. 2. **Use of large - scale datasets**: In order to enhance the robustness and generalization ability of the model, the research team used two large - scale datasets, LaSOT and GOT - 10k, for training. These datasets provide diverse scenarios, which help the model better cope with complex situations in practical applications. 3. **Alpha - Refine post - processing technology**: Although the Alpha - Refine method aims to improve tracking performance through accurate bounding box estimation, it did not achieve the expected effect in practical applications. However, the research team still tried it as one of the post - processing steps. Finally, the method proposed by this research team achieved remarkable results in the Perception Test Challenge 2024 competition, obtaining an average IoU (Intersection over Union) score of 0.813 and winning first place. ### Formula representation The formulas involved in the paper are mainly related to evaluation metrics, such as the average Intersection over Union (average IoU), and its calculation formula is as follows: \[ \text{average IoU}=\frac{1}{N}\sum_{i = 1}^{N}\frac{\text{Area of Overlap}}{\text{Area of Union}} \] where: - \(N\) is the number of tracked objects; - \(\text{Area of Overlap}\) represents the area of the overlapping region between the predicted bounding box and the ground - truth bounding box; - \(\text{Area of Union}\) represents the area of the union region between the predicted bounding box and the ground - truth bounding box. Through this formula, the performance of the model in the tracking task can be quantified.