Abstract:YOLO series are very classic detection frameworks in the field of object detection, and they have achieved remarkable results on general datasets. Among them, YOLOv5, as a single-stage multi-scale detector, has great advantages in accuracy and speed, but it still has the problem of inaccuracy localization when detecting the objects. In order to solve this problem, we propose three methods to improve YOLOv5. First, due to the conflict between classification and regression tasks, the classification and the localization in the detection head in our method are decoupled. Secondly, because the feature fusion method used by YOLOv5 can cause the problem of feature alignment, we added the deformable convolution to automatically align the features of different scales. Finally, we added the proposed multi-scale attention mechanism to the features of adjacent scales to predict a relative weighting between adjacent scales. Experiments show that our method on the PASCAL VOC dataset can obtain a mAP0.5 of 85.11% and a mAP0.5:0.95 of 63.33%.

A Decoupled YOLOv5 with Deformable Convolution and Multi-scale Attention