An Effective Two-stage Training Paradigm Detector for Small Dataset

Zheng Wang,Dong Xie,Hanzhi Wang,Jiang Tian
DOI: https://doi.org/10.48550/arXiv.2309.05652
2023-09-11
Computer Vision and Pattern Recognition
Abstract:Learning from the limited amount of labeled data to the pre-train model has always been viewed as a challenging task. In this report, an effective and robust solution, the two-stage training paradigm YOLOv8 detector (TP-YOLOv8), is designed for the object detection track in VIPriors Challenge 2023. First, the backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique. Then the detector is fine-tuned with elaborate augmentations. During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance. With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.
What problem does this paper attempt to address?