RemDet: Rethinking Efficient Model Design for UAV Object Detection

Chen Li,Rui Zhao,Zeyu Wang,Huiying Xu,Xinzhong Zhu
2024-12-13
Abstract:Object detection in Unmanned Aerial Vehicle (UAV) images has emerged as a focal area of research, which presents two significant challenges: i) objects are typically small and dense within vast images; ii) computational resource constraints render most models unsuitable for real-time deployment. Current real-time object detectors are not optimized for UAV images, and complex methods designed for small object detection often lack real-time capabilities. To address these challenges, we propose a novel detector, RemDet (Reparameter efficient multiplication Detector). Our contributions are as follows: 1) Rethinking the challenges of existing detectors for small and dense UAV images, and proposing information loss as a design guideline for efficient models. 2) We introduce the ChannelC2f module to enhance small object detection performance, demonstrating that high-dimensional representations can effectively mitigate information loss. 3) We design the GatedFFN module to provide not only strong performance but also low latency, effectively addressing the challenges of real-time detection. Our research reveals that GatedFFN, through the use of multiplication, is more cost-effective than feed-forward networks for high-dimensional representation. 4) We propose the CED module, which combines the advantages of ViT and CNN downsampling to effectively reduce information loss. It specifically enhances context information for small and dense objects. Extensive experiments on large UAV datasets, Visdrone and UAVDT, validate the real-time efficiency and superior performance of our methods. On the challenging UAV dataset VisDrone, our methods not only provided state-of-the-art results, improving detection by more than 3.4%, but also achieve 110 FPS on a single <a class="link-external link-http" href="http://4090.Codes" rel="external noopener nofollow">this http URL</a> are available at (this URL)(<a class="link-external link-https" href="https://github.com/HZAI-ZJNU/RemDet" rel="external noopener nofollow">this https URL</a>).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main challenges faced by object detection in Unmanned Aerial Vehicle (UAV) images: 1. **Small and Dense Objects**: In UAV images, objects are usually small and densely distributed in large - scale images, which makes it difficult to detect these objects. 2. **Computational Resource Limitations**: Due to computational resource limitations, most existing models cannot meet the requirements of real - time deployment. Current real - time object detectors are not optimized for UAV images, and complex small - object detection methods often lack real - time processing capabilities. Specifically, the paper proposes a new detector named RemDet (Reparameter efficient multiplication Detector) to address the above challenges. The following are the main contributions of the paper: 1. **Rethinking the Design of Existing Detectors**: By introducing information loss as a guiding principle for designing efficient models, a simple and effective structure is proposed to enhance small - object detection. 2. **Introducing the ChannelC2f Module**: This module expands the number of channels, improves small - object detection performance, and proves that high - dimensional representations can effectively reduce information loss. 3. **Designing the GatedFFN Module**: Through multiplication operations, efficient high - dimensional representations are achieved, which not only provides strong performance but also maintains low latency, thereby meeting the requirements of real - time detection. 4. **Proposing the CED Module**: Combining the advantages of ViT and CNN down - sampling, information loss is effectively reduced, and the extraction of context information for small and dense objects is particularly enhanced. Through extensive experiments on the large - scale UAV datasets Visdrone and UAVDT, the superiority of RemDet in terms of real - time efficiency and performance is verified. In particular, on the VisDrone dataset, RemDet not only improves the detection accuracy by more than 3.4% but also achieves an inference speed of 110 FPS on a single 4090 GPU. ### Formula Presentation Some of the formulas involved in the paper are as follows: - **Lagrangian Function**: \[ L_p(x'|x) = I(X; X')+\beta I(X; Y|X') \] where \(\beta\geq0\) is a slack variable used to balance complexity and irrelevance. - **MLP Representation**: \[ w_0^T x = w_1^T x+w_2^T x=\left(\sum_{i = 1}^{d}w_{1i}x_i\right)+\left(\sum_{j = 1}^{d}w_{2j}x_j\right) \] - **Multiplication Representation**: \[ (w_1^T x)*(w_2^T x)=\left(\sum_{i = 1}^{d}w_{1i}x_i\right)*\left(\sum_{j = 1}^{d}w_{2j}x_j\right)=\sum_{i = 1}^{d}\sum_{j = 1}^{d}w_{1i}w_{2j}x_i x_j \] These formulas show the mathematical principles behind the model design, especially how more efficient high - dimensional representations are achieved through multiplication operations.