Abstract:Thanks to the wide view field, the fisheye camera can get much more visual information. Thus, it is widely used in the field of computer vision. However, projection is often required for fisheye images to be used for object detection. Meanwhile, the projection will lead to distortion in fisheye images, and the discontinuous image edges will make the objects incomplete. Fisheye images are characterized by objects that are large near and small far. These problems are still challenges for the existing advanced object detector YOLOv7. Therefore, in this paper, we propose an improved YOLOv7 model. First, Modulated Deformable Convolution is introduced into the YOLOv7 model to automatically adapt to distortion changes of distorted objects in fisheye images. It not only adjusts the sampling position of the convolutional kernel but also further extends the deformation range. The improved model can efficiently extract features of distorted and edge-discontinuous objects. In addition, fisheye images are characterized by objects close to the fisheye lens being large, while objects farther away from the fisheye lens will be smaller. To further optimize the detection performance of small objects in fisheye images, Swin Transformer is also introduced into the YOLOv7 model, and Swin Transformer Block with Window Multi-head Self-Attention (W-MSA) Effectively enhances Network Local Perception. Finally, our proposed model achieves up to 2.4% improvement in mAP compared to the original YOLOv7 model on the ERP-360 dataset. Also, the proposed model achieves the best results compared to other state-of-the-art object detection methods for equirectangular projection images. On the VOC-360 dataset, our proposed model improves the mAP by up to 5.9% compared to the original YOLOv7 model. The experimental results show that the proposed models achieve good results for object detection in both fisheye images and equirectangular projection images. The ERP-360 dataset, source code and pre-trained models for related tasks can be found at https://github.com/xiaoxiaomichong/ERP-360dataset.

GET: Group Equivariant Transformer for Person Detection of Overhead Fisheye Images

Rotation-equivariant Transformer for Oriented Person Detection of Overhead Fisheye Images

Object Detection and Localization in 3D Environment by Fusing Raw Fisheye Image and Attitude Data

Downside Hemisphere Object Detection and Localization of MAV by Fisheye Camera

Orientation-aware People Detection and Counting Method Based on Overhead Fisheye Camera

Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images

Multi-Camera Calibration Free BEV Representation for 3D Object Detection

Location-guided Head Pose Estimation for Fisheye Image

EVT: Efficient View Transformation for Multi-Modal 3D Object Detection

GET: Group Event Transformer for Event-Based Vision

FPNFormer: Rethink the Method of Processing the Rotation-Invariance and Rotation-Equivariance on Arbitrary-Oriented Object Detection

OEGR-DETR: A Novel Detection Transformer Based on Orientation Enhancement and Group Relations for SAR Object Detection

An adversarial pedestrian detection model based on virtual fisheye image training

Swin‐fisheye: Object detection for fisheye images

AST: Annulus Swin Transformer for Pedestrians Detection under Top-view Fisheye Image.

OARPD: occlusion-aware rotated people detection in overhead fisheye images

RMDC: Rotation-mask Deformable Convolution for Object Detection in Top-View Fisheye Cameras

FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera

MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers

FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification with Efficacy Domain Exploration

Geometric Features Enhanced Human-Object Interaction Detection