YOLO -- You only look 10647 times

Christian Limberg,Andrew Melnik,Augustin Harter,Helge Ritter
DOI: https://doi.org/10.48550/arXiv.2201.06159
2022-01-16
Computer Vision and Pattern Recognition
Abstract:With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-stage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification models. In addition, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visualization
What problem does this paper attempt to address?