Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai,Qiuxia Lai,Yuwei Wang,Wenguan Wang,Zeren Sun,Yazhou Yao
2024-03-20
Abstract:Object detection in remote sensing images (RSIs) often suffers from several increasing challenges, including the large variation in object scales and the diverse-ranging context. Prior methods tried to address these challenges by expanding the spatial receptive field of the backbone, either through large-kernel convolution or dilated convolution. However, the former typically introduces considerable background noise, while the latter risks generating overly sparse feature representations. In this paper, we introduce the Poly Kernel Inception Network (PKINet) to handle the above challenges. PKINet employs multi-scale convolution kernels without dilation to extract object features of varying scales and capture local context. In addition, a Context Anchor Attention (CAA) module is introduced in parallel to capture long-range contextual information. These two components work jointly to advance the performance of PKINet on four challenging remote sensing detection benchmarks, namely DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R.
Computer Science
What problem does this paper attempt to address?
The paper mainly addresses the problem of object detection in remote sensing images, particularly how to effectively handle the significant variations in object scales and diverse contextual information. To solve these issues, the authors propose a multi-scale convolutional network called the Poly Kernel Inception Network (PKINet). The core contributions of PKINet are: 1. **Multi-scale texture feature extraction**: By employing depthwise separable convolution kernels of different sizes (without dilation), PKINet can extract multi-scale texture features at different receptive fields and adaptively fuse these features through a channel fusion mechanism to capture local contextual information. 2. **Long-range context capture**: The introduction of the Context Anchor Attention (CAA) module captures long-range contextual information, further enhancing the feature representation of the central region. 3. **Lightweight design**: Utilizing depthwise separable convolutions and 1D convolutions, the model has fewer parameters and high computational efficiency. Through the above methods, PKINet demonstrates significant performance improvements on four challenging remote sensing detection benchmark datasets (DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R). Experimental results show that PKINet not only effectively handles variations in object scales but also fully leverages the contextual information around objects, thereby improving the accuracy of object detection in remote sensing images.