YOLO-ELA: Efficient Local Attention Modeling for High-Performance Real-Time Insulator Defect Detection

Olalekan Akindele,Joshua Atolagbe
2024-10-16
Abstract:Existing detection methods for insulator defect identification from unmanned aerial vehicles (UAV) struggle with complex background scenes and small objects, leading to suboptimal accuracy and a high number of false positives detection. Using the concept of local attention modeling, this paper proposes a new attention-based foundation architecture, YOLO-ELA, to address this issue. The Efficient Local Attention (ELA) blocks were added into the neck part of the one-stage YOLOv8 architecture to shift the model's attention from background features towards features of insulators with defects. The SCYLLA Intersection-Over-Union (SIoU) criterion function was used to reduce detection loss, accelerate model convergence, and increase the model's sensitivity towards small insulator defects, yielding higher true positive outcomes. Due to a limited dataset, data augmentation techniques were utilized to increase the diversity of the dataset. In addition, we leveraged the transfer learning strategy to improve the model's performance. Experimental results on high-resolution UAV images show that our method achieved a state-of-the-art performance of 96.9% mAP0.5 and a real-time detection speed of 74.63 frames per second, outperforming the baseline model. This further demonstrates the effectiveness of attention-based convolutional neural networks (CNN) in object detection tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problems of accuracy and high false - positive rate in existing insulator defect detection methods in complex background scenarios and small - target detection. Specifically: 1. **Complex background interference**: Existing detection methods have difficulty distinguishing background features from those of defective insulators, resulting in inaccurate detection results. 2. **Difficulty in small - target detection**: Insulator defects usually appear as small targets in images taken by drones, which makes detection very challenging. 3. **Low dataset diversity**: Due to the limited available datasets, there is a lack of sufficient diversity during model training, which affects the generalization ability of the model. To solve these problems, the author proposes a new method based on the YOLOv8 architecture - **YOLO - ELA** (Efficient Local Attention Modeling). By introducing an efficient local attention mechanism (ELA), this method can shift the model's attention from background features to those of defective insulators, thereby improving detection accuracy and reducing false positives. In addition, the SCYLLA Intersection - Over - Union (SIoU) loss function is used to accelerate model convergence and increase sensitivity to small targets. Experimental results show that the real - time detection speed of this method on high - resolution drone images reaches 74.63 frames per second, and mAP 0.5 reaches 96.9%, significantly outperforming the baseline model. ### Formula summary 1. **Spatial attention calculation in the ELA module**: - Feature vector in the horizontal direction: \[ z^h_c(h)=\frac{1}{H}\sum_{0\leq i < H}x_c(h, i) \] - Feature vector in the vertical direction: \[ z^w_c(w)=\frac{1}{W}\sum_{0\leq j < W}x_c(j, w) \] - Generate position attention maps in the horizontal and vertical directions: \[ y^h = \sigma\left(\text{Gn}\left(F_h(z^h)\right)\right) \] \[ y^w = \sigma\left(\text{Gn}\left(F_w(z^w)\right)\right) \] - The final local attention map: \[ Y = x_c\times y^h\times y^w \] 2. **SIoU loss function**: - Angle loss: \[ \Lambda = 1 - 2\sin^2\left(\arcsin(x)-\frac{\pi}{4}\right) \] where, \[ x=\frac{c_h}{\sigma}, \quad \sigma=\max(b_{gt}^{cy}, b_{pred}^{cy})-\min(b_{gt}^{cy}, b_{pred}^{cy}) \] - Distance loss: \[ \Delta=\sum_{t = x,y}\left(1 - e^{-(2-\Lambda)\rho_t}\right) \] where, \[ \rho_x=\left(\frac{b_{gt}^{cx}-b_{pred}^{cx}}{C_w}\right)^2, \quad \rho_y=\left(\frac{b_{gt}^{cy}-b_{pred}^{cy}}{C_h}\right)^2 \] - Shape loss: \[ \Omega=\s