Abstract:We develop a probabilistic interpretation of two-stage object detection. We show that this probabilistic interpretation motivates a number of common empirical training practices. It also suggests changes to two-stage detection pipelines. Specifically, the first stage should infer proper object-vs-background likelihoods, which should then inform the overall score of the detector. A standard region proposal network (RPN) cannot infer this likelihood sufficiently well, but many one-stage detectors can. We show how to build a probabilistic two-stage detector from any state-of-the-art one-stage detector. The resulting detectors are faster and more accurate than both their one- and two-stage precursors. Our detector achieves 56.4 mAP on COCO test-dev with single-scale testing, outperforming all published results. Using a lightweight backbone, our detector achieves 49.2 mAP on COCO at 33 fps on a Titan Xp, outperforming the popular YOLOv4 model.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is that the existing two - stage object detectors fail to fully utilize the probability interpretation during the design and training process, thus affecting their accuracy and speed. Specifically: 1. **Limitations of the first stage**: - The traditional Region Proposal Network (RPN) is mainly used to maximize the recall rate, but it cannot well estimate the likelihood of objects and the background. This results in the low quality of the candidate boxes generated in the first stage, affecting the accuracy of subsequent classification. 2. **Non - probability consistency of the overall framework**: - Although the second stage has a probability interpretation, the entire two - stage detector does not have a unified probability framework to combine the results of the two stages. This makes the training and inference processes of the model less efficient and consistent. 3. **Requirement for performance improvement**: - The paper aims to improve the design of the two - stage detector by introducing probability interpretation, making it not only faster but also more accurate. The specific goal is to use advanced single - stage detectors to replace the traditional RPN and combine the two to obtain better performance. ### Solution The paper proposes a new two - stage object detection framework with probability interpretation. The main contributions include: - **Introduction of probability interpretation**: - Consider the first stage of the two - stage detector as inferring the object - vs - background likelihood and use it to guide the final detection score. - **Improved first - stage design**: - Use powerful single - stage detectors (such as RetinaNet, CenterNet, etc.) as the first stage. These detectors can provide more accurate object likelihood estimates instead of just pursuing a high recall rate. - **Joint optimization**: - By optimizing a joint probability objective function, ensure that the training of the two stages is more coordinated and consistent, thereby improving the overall performance. ### Experimental results Experiments show that this new probability - interpretation framework significantly improves the accuracy and speed of the two - stage detector. For example, on the COCO dataset, when using the ResNeXt - 101 - DCN backbone network, the proposed framework reaches 56.4 mAP, exceeding all published results. In addition, using the lightweight DLA - BiFPN backbone network, this framework achieves 49.2 mAP and a speed of 33 fps on the Titan Xp GPU, outperforming the popular YOLOv4 model. In conclusion, this paper improves the design of the two - stage object detector by introducing probability interpretation, solves the problems existing in traditional methods, and shows significant performance improvements on multiple benchmark datasets.

Probabilistic two-stage detection

A Two-Stage Human Body Detector on Depth Data

Overview of Two-Stage Object Detection Algorithms

Can the Query-based Object Detector Be Designed with Fewer Stages?

Probabilistic Approach for Road-Users Detection

AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

Condensing Two-stage Detection with Automatic Object Key Part Discovery

Light-Head R-CNN: In Defense of Two-Stage Object Detector

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

Robust one-stage object detection with location-aware classifiers

Corner Proposal Network for Anchor-Free, Two-Stage Object Detection

Feature difference for single-shot object detection

HPV-RCNN: Hybrid Point–Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

An Effective Two-stage Training Paradigm Detector for Small Dataset

Towards Discriminative and Transferable One-Stage Few-Shot Object Detectors

HTD: Heterogeneous Task Decoupling for Two-Stage Object Detection

Modification method for single-stage object detectors that allows to exploit the temporal behaviour of a scene to improve detection accuracy

Dual Relation Knowledge Distillation for Object Detection