Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context

Amrita Singh,Snehasis Mukherjee
2024-09-17
Abstract:Dense features are important for detecting minute objects in images. Unfortunately, despite the remarkable efficacy of the CNN models in multi-scale object detection, CNN models often fail to detect smaller objects in images due to the loss of dense features during the pooling process. Atrous convolution addresses this issue by applying sparse kernels. However, sparse kernels often can lose the multi-scale detection efficacy of the CNN model. In this paper, we propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. A fixed atrous rate limits the performance of the CNN models in the convolutional layers. To overcome this limitation, we introduce a switchable mechanism that allows for dynamically adjusting the atrous rate during the forward pass. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks, without losing the dense features. Further, we apply a depth-wise switchable atrous rate to the proposed network, to improve the scale-invariant features. Finally, we apply global context on the proposed model. Our extensive experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when detecting small objects at different scales in images, traditional Convolutional Neural Network (CNN) models are unable to effectively detect small objects due to the loss of dense features in the pooling process. Although existing deep - learning techniques have made significant progress in multi - scale object detection, they still have limitations when detecting smaller objects. To overcome these limitations, the authors propose an Adaptive Dilated Convolution Network (SAC - Net) based on the EfficientDet model. By introducing a switchable mechanism to dynamically adjust the dilation rate, it can maintain the advantages of low - level and high - level features during forward propagation, improve the performance of multi - scale object detection tasks, and at the same time not lose dense features. Specifically, the main contributions of the paper are as follows: 1. **Adaptive Dilated Convolution Layer**: Apply adaptive dilated convolution layers with different dilation rates, which can be adaptively adjusted according to different depths of the architecture. 2. **Global Context**: Apply global context before and after the deep convolution layer, making the proposed method scale - invariant. 3. **Deep Dilated Convolution on Light - weight Model**: Apply deep dilated convolution and global context on the light - weight EfficientDet model to enhance model performance while reducing model parameters. Through these improvements, the paper aims to improve the accuracy of the model in detecting objects at different scales, especially when detecting smaller objects. Experimental results show that the proposed SAC - Net outperforms the existing state - of - the - art models on the benchmark dataset.