Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

Bingqing Zhang,Sen Wang,Yifan Liu,Brano Kusy,Xue Li,Jiajun Liu

DOI: https://doi.org/10.1145/3581783.3612090

2023-08-22

Abstract:Current video object detection (VOD) models often encounter issues with over-aggregation due to redundant aggregation strategies, which perform feature aggregation on every frame. This results in suboptimal performance and increased computational complexity. In this work, we propose an image-level Object Detection Difficulty (ODD) metric to quantify the difficulty of detecting objects in a given image. The derived ODD scores can be used in the VOD process to mitigate over-aggregation. Specifically, we train an ODD predictor as an auxiliary head of a still-image object detector to compute the ODD score for each image based on the discrepancies between detection results and ground-truth bounding boxes. The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process. Comprehensive experiments demonstrate that, when utilized for selecting global reference frames, ODD-VOD consistently enhances the accuracy of Global-frame-based VOD models. When employed for acceleration, ODD-VOD consistently improves the frames per second (FPS) by an average of 73.3% across 8 different VOD models without sacrificing accuracy. When combined, ODD-VOD attains state-of-the-art performance when competing with many VOD methods in both accuracy and speed. Our work represents a significant advancement towards making VOD more practical for real-world applications.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper primarily addresses the issue of over-aggregation in Video Object Detection (VOD) and proposes a new framework—Object Detection Difficulty-based VOD (ODD-VOD) to improve detection accuracy and speed. #### Specific Problems: 1. **Over-aggregation Issue**: Current VOD models perform feature aggregation operations on each frame, leading to increased computational costs and decreased detection speed. 2. **Impact of Low-Quality Reference Frames**: Low-quality reference frames may not bring any benefits during the VOD aggregation process and can negatively affect overall performance. #### Main Contributions: 1. **ODD Metric**: An image-level Object Detection Difficulty (ODD) metric is proposed to quantify the detection difficulty of a given image. 2. **ODD Predictor**: An auxiliary module—ODD predictor is trained to predict the ODD score of test frames. 3. **ODD Scheduler**: A hybrid detection pipeline—ODD scheduler is designed to select the appropriate detector (SIOD or VOD) based on the ODD score of the input frame. 4. **Global Reference Frame Selector (OGRFS)**: A global reference frame selection method based on ODD scores is proposed to select high-quality reference frames during training and inference stages. Through these methods, the paper achieves a significant improvement in the speed of VOD models without sacrificing accuracy and obtains excellent results in various benchmark tests.

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Practical Video Object Detection via Feature Selection and Aggregation

Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Adaptive Feature Aggregation for Video Object Detection

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

Confidence-guided Adaptive Gate and Dual Differential Enhancement for Video Salient Object Detection

Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection

DFA: Dynamic Feature Aggregation for Efficient Video Object Detection

Frame Offloading Scheduling Algorithm for Real-time Object Detection in Edge Environment

Optical-flow-based framework to boost video object detection performance with object enhancement

Video object detection via space–time feature aggregation and result reuse

Temporal-adaptive sparse feature aggregation for video object detection

Frequency-Adaptive Low-Latency Object Detection Using Events and Frames

YOLODCC: Improved YOLOv8 combined with dynamic confidence compensation for lightweight moving object detection

Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers

Adaptive occlusion object detection algorithm based on OL-IoU

What Makes Good Open-Vocabulary Detector: A Disassembling Perspective

Accelerating real‐time object detection in high‐resolution video surveillance

USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and Segment Anything Model