Pay Attention to Them: Deep Reinforcement Learning-Based Cascade Object Detection.

Songtao Liu,Di Huang,Yunhong Wang
DOI: https://doi.org/10.1109/TNNLS.2019.2933451
IF: 14.255
2020-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:This paper proposes a novel and effective approach, namely pay attention to them (PAT), to general object detection, which integrates the bottom-up single-shot convolutional neural networks (CNNs) and a top-down operating strategy. PAT starts by routinely applying a CNN regression detector to the entire input image. It then conducts refinement, which locates a sub-region that probably contains relevant objects through an intelligent agent built with an attentional mechanism and zooms it in to launch the detector again. This refining step is repeated in a cascaded way, where all the bounding boxes produced are scaled according to the original resolution and the sub-marginal and overlapping parts are wiped out to generate the final output. Due to such progressive processing, PAT improves the detection accuracy, especially for the objects of small sizes. Extensive experiments are conducted on the Pascal VOC and MS COCO benchmarks, and the results show that PAT is able to improve the representative baseline detectors, i.e., single shot multibox detector, YOLOv2, and Faster regions with CNN features, with remarkable accuracy gains [about 2%–5% mean Average Precision (mAP)], which demonstrates its competency.
What problem does this paper attempt to address?