XS-VID: An Extremely Small Video Object Detection Dataset

Jiahao Guo,Ziyang Xu,Lianjun Wu,Fei Gao,Wenyu Liu,Xinggang Wang

2024-07-25

Abstract:Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. However, existing SVOD datasets are scarce and suffer from issues such as insufficiently small objects, limited object categories, and lack of scene diversity, leading to unitary application scenarios for corresponding methods. To address this gap, we develop the XS-VID dataset, which comprises aerial data from various periods and scenes, and annotates eight major object categories. To further evaluate existing methods for detecting extremely small objects, XS-VID extensively collects three types of objects with smaller pixel areas: extremely small (\textit{es}, $0\sim12^2$), relatively small (\textit{rs}, $12^2\sim20^2$), and generally small (\textit{gs}, $20^2\sim32^2$). XS-VID offers unprecedented breadth and depth in covering and quantifying minuscule objects, significantly enriching the scene and object diversity in the dataset. Extensive validations on XS-VID and the publicly available VisDrone2019VID dataset show that existing methods struggle with small object detection and significantly underperform compared to general object detectors. Leveraging the strengths of previous methods and addressing their weaknesses, we propose YOLOFT, which enhances local feature associations and integrates temporal motion features, significantly improving the accuracy and stability of SVOD. Our datasets and benchmarks are available at \url{<a class="link-external link-https" href="https://gjhhust.github.io/XS-VID/" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily addresses the following issues: 1. **Insufficiency of Small Object Video Detection (SVOD) Datasets**: - Existing SVOD datasets are scarce and suffer from problems such as insufficiently small object sizes, limited categories, and single scenarios, leading to limited application scenarios for corresponding methods. - To fill this gap, the authors developed the XS-VID dataset, which includes aerial data collected from different time periods and scenes, annotated with 8 major object categories. 2. **Limitations of Existing Methods in Extremely Small Object Detection**: - Extensive validation on the XS-VID and publicly available VisDrone2019VID datasets shows that existing methods perform poorly in small object detection, especially for extremely small objects (e.g., 0-122 pixels). - These methods exhibit significant deficiencies in background confusion, misclassification, and texture distortion. 3. **Proposing an Improved SVOD Network**: - To address the above issues, the authors proposed the YOLOFT network, which combines YOLOv8 with flow-based recurrent full-field transforms (Field Transforms), enhancing local feature associations and integrating temporal motion features, significantly improving the accuracy and stability of small object detection. Through these efforts, the paper aims to advance research and development in the field of small object video detection and provide a benchmark reference for future studies.

XS-VID: An Extremely Small Video Object Detection Dataset

Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

ESOD: Efficient Small Object Detection on High-Resolution Images

YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark

OLOD: a new UAV dataset and benchmark for single tiny object tracking

ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

Efficient Small Object Detection You Only Look Once: A Small Object Detection Algorithm for Aerial Images

Towards Open-Vocabulary Video Instance Segmentation

Efficient-Lightweight YOLO: Improving Small Object Detection in YOLO for Aerial Images

LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

Video Object Segmentation in Panoptic Wild Scenes

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

SOD-YOLO: A lightweight small object detection framework

DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor

SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images

Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism

Practical Video Object Detection via Feature Selection and Aggregation

An advanced YOLOv3 method for small object detection

UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective