Wide-residual-inception networks for real-time object detection

Youngwan Lee,Huieun Kim,Eunsoo Park,Xuenan Cui,Hakil Kim
DOI: https://doi.org/10.1109/ivs.2017.7995808
2017-06-01
Abstract:Since convolutional neural network (CNN) models emerged, several tasks in computer vision have actively deployed CNN models for feature extraction. However, the conventional CNN models have a high computational cost and require high memory capacity, which is impractical and unaffordable for commercial applications such as real-time on-road object detection on embedded boards or mobile platforms. To tackle this limitation of CNN models, this paper proposes a wide-residual-inception (WR-Inception) network, which constructs the architecture based on a residual inception unit that captures objects of various sizes on the same feature map, as well as shallower and wider layers, compared to state-of-the-art networks like ResNets. To verify the proposed networks, this paper conducted two experiments; one is a classification task on CIFAR-10/100 and the other is an on-road object detection task using a Single-Shot Multi-box Detector (SSD) on the KITTI dataset. WR-Inception achieves comparable accuracy on CIFAR-10/100, with test errors at 4.82% and 23.12%, respectively, which outperforms 164-layer Pre-ResNets. In addition, the detection experiments demonstrate that the WR-Inception-based SSD outperforms ResNet-101-based SSD on KITTI. Besides, WR-Inception-based SSD achieves 16 frames per seconds, which is 3.85 times faster than ResNet-101-based SSD. We could expect WR-Inception to be used for real application systems.
What problem does this paper attempt to address?