Gated CNN: Integrating multi-scale feature layers for object detection
Jin Yuan,Heng-Chang Xiong,Yi Xiao,Weili Guan,Meng Wang,Richang Hong,Zhi-Yong Li
DOI: https://doi.org/10.1016/j.patcog.2019.107131
IF: 8
2020-09-01
Pattern Recognition
Abstract:<p>Different convolutional layers in an explainable CNN usually encode different kinds of semantic information for an image, thus the feature fusion approaches like SSD, DSSD, and FPN are widely employed to enhance the detection performance by integrating different results based on multiple convolutional layers. However, the typical fusion approaches first need to independently detect objects based on one convolutional layer before fusion, and this single layer may exist noises or be irrelevant to objects, resulting in detection failure. To tackle the above problem, this paper proposes "Gated CNN" (short for "G-CNN") to introduce a "gate" structure to integrate multiple convolutional layers for object detection. Injected by multi-scale feature layers, a gate employs several filters to extract useful information and block noises by executing one more convolutional or deconvolutional operation simultaneously, thus a gate-based feature layer is more effective and efficient as compared to the convolutional one. Besides, G-CNN employs a detector with two branches to predict the locations and categories of objects, respectively, as well as an inter-class loss to help detectors learn discrepant information among categories. Therefore, the learned detectors could better differentiate similar objects of different categories. Extensive experiments are conducted on two image datasets (PASCAL VOC and COCO), and the results demonstrate that G-CNN outperforms the state-of-the-art approaches, with a mAP of 40.9% at 10.6 FPS.</p>
computer science, artificial intelligence,engineering, electrical & electronic