Zoom-and-Reasoning: Joint Foreground Zoom and Visual-Semantic Reasoning Detection Network for Aerial Images.

Zuhao Ge,Lizhe Qi,Yuzheng Wang,Yunquan Sun
DOI: https://doi.org/10.1109/lsp.2022.3229638
2022-01-01
IEEE Signal Processing Letters
Abstract:Aerial image object detection remains rather challenging, due to the small object gathering and confusion of inter-class similarities and intra-class diversity. Confronting such challenges, we propose a two-stage framework, 'Zoom & Reasoning Det,' which performs detection in a foreground highlight manner and leverages contextual relations to assist detection. In the coarse foreground zoom stage, different from earlier works that divide original images into patches and perform detection on each patch separately, we design Foreground Zoom Strategy (FZS), which zooms foreground dense regions from a coarse detector and packs them into one image. In the fine reasoning detect stage, motivated by a human visual mechanism that can achieve correct recognition by reasoning through context, we present Visual-Semantic Reasoning Network (VSRNet), consisting of Visual Reasoning Graph (VRG) and Semantic Reasoning Graph (SRG), simultaneously considering local visual and global semantic contextual relational information for each instance. Each instance feature representation is further refined by aggregating the outputs of VSRNet. Comprehensive experiments conducted on two challenging aerial image datasets, VisDrone and UAVDT, demonstrate the advantage of our method over the state-of-the-art.
What problem does this paper attempt to address?