Towards Better Object Detection in Scale Variation with Adaptive Feature Selection

Zehui Gong,Dong Li
DOI: https://doi.org/10.48550/arXiv.2012.03265
2020-12-09
Abstract:It is a common practice to exploit pyramidal feature representation to tackle the problem of scale variation in object instances. However, most of them still predict the objects in a certain range of scales based solely or mainly on a single-level representation, yielding inferior detection performance. To this end, we propose a novel adaptive feature selection module (AFSM), to automatically learn the way to fuse multi-level representations in the channel dimension, in a data-driven manner. It significantly improves the performance of the detectors that have a feature pyramid structure, while introducing nearly free inference overhead. Moreover, a class-aware sampling mechanism (CASM) is proposed to tackle the class imbalance problem, by re-weighting the sampling ratio to each of the training images, based on the statistical characteristics of each class. This is crucial to improve the performance of the minor classes. Experimental results demonstrate the effectiveness of the proposed method, with 83.04% mAP at 15.96 FPS on the VOC dataset, and 39.48% AP on the VisDrone-DET validation subset, respectively, outperforming other state-of-the-art detectors considerably. The code is available at <a class="link-external link-https" href="https://github.com/ZeHuiGong/AFSM.git" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of the decline in detection performance caused by object scale changes in object detection. Specifically, the paper points out that although the existing feature pyramid structures have achieved certain success in dealing with objects of different scales, most methods still mainly rely on the feature representation of a single layer to predict objects within a specific scale range, which leads to the decline in detection performance. To improve this situation, the author proposes an Adaptive Feature Selection Module (AFSM), which can automatically learn how to fuse multi - layer feature representations in the channel dimension, thereby significantly improving the performance of detectors with feature pyramid structures while hardly increasing the inference overhead. In addition, the paper also proposes a Class - Aware Sampling Mechanism (CASM) to deal with the class imbalance problem, that is, automatically allocating the sampling weights of each image according to the number of objects in each class during the training process, thereby enhancing the supervision signals for the minority classes and further improving the detection performance of these classes. In general, by introducing AFSM and CASM, the paper aims to improve the performance of object detection algorithms in dealing with large - scale changes and class imbalance problems. The experimental results show that the proposed method achieves 83.04% mAP on the VOC dataset and 39.48% AP on the VisDrone - DET validation subset, significantly outperforming other state - of - the - art detectors.