Abstract:CNN architectures are generally heavy on memory and computational requirements which makes them infeasible for embedded systems with limited hardware resources. We propose dual convolutional kernels (DualConv) for constructing lightweight deep neural networks. DualConv combines 3$\times$3 and 1$\times$1 convolutional kernels to process the same input feature map channels simultaneously and exploits the group convolution technique to efficiently arrange convolutional filters. DualConv can be employed in any CNN model such as VGG-16 and ResNet-50 for image classification, YOLO and R-CNN for object detection, or FCN for semantic segmentation. In this paper, we extensively test DualConv for classification since these network architectures form the backbones for many other tasks. We also test DualConv for image detection on YOLO-V3. Experimental results show that, combined with our structural innovations, DualConv significantly reduces the computational cost and number of parameters of deep neural networks while surprisingly achieving slightly higher accuracy than the original models in some cases. We use DualConv to further reduce the number of parameters of the lightweight MobileNetV2 by 54% with only 0.68% drop in accuracy on CIFAR-100 dataset. When the number of parameters is not an issue, DualConv increases the accuracy of MobileNetV1 by 4.11% on the same dataset. Furthermore, DualConv significantly improves the YOLO-V3 object detection speed and improves its accuracy by 4.4% on PASCAL VOC dataset.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the memory and computational resource limitations encountered when deploying modern convolutional neural networks (CNNs) on embedded systems and mobile platforms. Specifically, the paper proposes a new convolutional kernel design - DualConv (Dual Convolution), aiming to construct lightweight deep neural networks. DualConv simultaneously processes the same input feature map channels by combining 3×3 and 1×1 convolutional kernels and efficiently arranges convolutional filters using group convolution techniques, thereby reducing computational cost and the number of parameters while maintaining or improving the accuracy of the model.
### Main Problems
1. **High Computational and Storage Resource Requirements**: Due to their deep - layer structures and complexity, modern CNN models have a significant increase in model size (the number of parameters/weights) and the required computational resources, which makes these models only runnable on servers equipped with high - performance GPUs.
2. **Limitations of Embedded Devices and Mobile Platforms**: Although there is a huge demand for deploying deep models on embedded devices and mobile platforms, current network architectures are not suitable for these systems because of their limited memory, power, and computational resources.
3. **Balance between Performance and Efficiency**: On embedded devices and mobile platforms, network accuracy, computational complexity, and the number of parameters are all important factors for evaluating different network architectures. Therefore, designing lightweight and accurate CNN models that can be deployed on these platforms has become an active research direction.
### Solutions
The DualConv proposed in the paper solves the above problems in the following ways:
- **Combining 3×3 and 1×1 Convolution Kernels**: Simultaneously process the same input feature map channels, retain the original information of the input feature map, and enable deeper convolutional layers to extract information more effectively.
- **Utilizing Group Convolution Techniques**: Efficiently arrange convolutional filters to reduce computational cost.
- **Wide Applicability**: It can replace standard convolutions in existing CNN models, such as VGG - 16, ResNet - 50, YOLO, and R - CNN, etc., and is applicable to image classification, object detection, and semantic segmentation tasks.
### Experimental Results
The experimental results show that DualConv significantly reduces the computational cost and the number of parameters of deep neural networks, and in some cases even achieves higher accuracy than the original model. For example:
- **On the CIFAR - 100 dataset**, using DualConv reduces the number of parameters of MobileNetV2 by 54% with only a 0.68% loss in accuracy.
- **On the PASCAL VOC dataset**, the YOLO - V3 improved by DualConv has an increase of 4.4% in the mAP value and at the same time significantly accelerates the detection speed.
In conclusion, through proposing DualConv, this paper successfully solves the resource limitation problems faced by modern CNN models when deployed on embedded systems and mobile platforms, providing new ideas for the design of lightweight deep neural networks.