DeIoU:Towards Distinguishable Box Prediction in Densely Packed Object Detection

Linfei Wang,Yibing Zhan,Long Lan,Xu Lin,Dapeng Tao,Xinbo Gao
DOI: https://doi.org/10.1109/tcsvt.2024.3415657
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The Intersection over Union (IoU) has been widely employed in various stages of object detection owing to its ability to quantify the similarity between boxes objectively. However, in densely packed scenes full of crowded and small-sized objects, adjacent positive boxes often exhibit high levels of overlap. This overlap interference compromises the consistency between quality evaluation and confidence, leading to ambiguous box prediction within the previous IoU-based models. To address this issue, we design a novel learning paradigm tailored for Dense scenes based on IoU, called DeIoU. This approach effectively suppresses unnecessary overlap between predicted boxes and thereby enhances representation learning for non-salient objects. Specifically, it consists of a dense box regression loss LDeIoU and a one-to-many (O2M) label matching strategy guided by DeIoU. These components focus on calibrating the position and shape prediction quality during the model training, learning distinguishable object features by penalizing overlap interference between neighboring boxes. Extensive experiments on four object detection datasets including SKU-110K, CrowdHuman, MS COCO 2017, and DIOR, demonstrate that our DeIoU-based learning strategy outperforms other state-of-the-art methods. Notably, the proposed method delivers a substantial improvement (average 1.3 AP and 1.8 M R -2 ) across popular detectors on SKU-110K and CrowdHuman while exhibiting distinct competitiveness on small objects within natural scenes.
What problem does this paper attempt to address?