Probabilistic two-stage detection

Xingyi Zhou,Vladlen Koltun,Philipp Krähenbühl
DOI: https://doi.org/10.48550/arXiv.2103.07461
2021-03-13
Abstract:We develop a probabilistic interpretation of two-stage object detection. We show that this probabilistic interpretation motivates a number of common empirical training practices. It also suggests changes to two-stage detection pipelines. Specifically, the first stage should infer proper object-vs-background likelihoods, which should then inform the overall score of the detector. A standard region proposal network (RPN) cannot infer this likelihood sufficiently well, but many one-stage detectors can. We show how to build a probabilistic two-stage detector from any state-of-the-art one-stage detector. The resulting detectors are faster and more accurate than both their one- and two-stage precursors. Our detector achieves 56.4 mAP on COCO test-dev with single-scale testing, outperforming all published results. Using a lightweight backbone, our detector achieves 49.2 mAP on COCO at 33 fps on a Titan Xp, outperforming the popular YOLOv4 model.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that the existing two - stage object detectors fail to fully utilize the probability interpretation during the design and training process, thus affecting their accuracy and speed. Specifically: 1. **Limitations of the first stage**: - The traditional Region Proposal Network (RPN) is mainly used to maximize the recall rate, but it cannot well estimate the likelihood of objects and the background. This results in the low quality of the candidate boxes generated in the first stage, affecting the accuracy of subsequent classification. 2. **Non - probability consistency of the overall framework**: - Although the second stage has a probability interpretation, the entire two - stage detector does not have a unified probability framework to combine the results of the two stages. This makes the training and inference processes of the model less efficient and consistent. 3. **Requirement for performance improvement**: - The paper aims to improve the design of the two - stage detector by introducing probability interpretation, making it not only faster but also more accurate. The specific goal is to use advanced single - stage detectors to replace the traditional RPN and combine the two to obtain better performance. ### Solution The paper proposes a new two - stage object detection framework with probability interpretation. The main contributions include: - **Introduction of probability interpretation**: - Consider the first stage of the two - stage detector as inferring the object - vs - background likelihood and use it to guide the final detection score. - **Improved first - stage design**: - Use powerful single - stage detectors (such as RetinaNet, CenterNet, etc.) as the first stage. These detectors can provide more accurate object likelihood estimates instead of just pursuing a high recall rate. - **Joint optimization**: - By optimizing a joint probability objective function, ensure that the training of the two stages is more coordinated and consistent, thereby improving the overall performance. ### Experimental results Experiments show that this new probability - interpretation framework significantly improves the accuracy and speed of the two - stage detector. For example, on the COCO dataset, when using the ResNeXt - 101 - DCN backbone network, the proposed framework reaches 56.4 mAP, exceeding all published results. In addition, using the lightweight DLA - BiFPN backbone network, this framework achieves 49.2 mAP and a speed of 33 fps on the Titan Xp GPU, outperforming the popular YOLOv4 model. In conclusion, this paper improves the design of the two - stage object detector by introducing probability interpretation, solves the problems existing in traditional methods, and shows significant performance improvements on multiple benchmark datasets.