GET: Group Equivariant Transformer for Person Detection of Overhead Fisheye Images

Yongqing Chen,Dandan Zhu,Nanyu Li,You Zhou,Yong Bai
DOI: https://doi.org/10.1007/s10489-023-04747-6
IF: 5.3
2023-01-01
Applied Intelligence
Abstract:Fisheye cameras has a large field of view, so it is widely used in scene monitoring, robot navigation, intelligent system, virtual reality panorama, augmented reality panorama and other fields, but person detection under the overhead fisheye camera is still a challenge due to its unique radial geometry and barrel distortion. Generic object detection algorithms do not work well for person detection on panoramic images of the fisheye camera. Recent approaches either use radially aligned bounding boxes to detect persons or improve anchor-based methods to obtain rotated bounding boxes. However, these methods require additional hyperparameters (e.g., anchor boxes) and have low generalization ability. To address this issue, we propose a novel model called Group Equivariant Transformer (GET) which uses the Transformer to directly regress the bounding boxes and rotation angles. GET not need any additional hyperparameters and have generalization ability. In our GET, we uses the Group Equivariant Convolutional Network (GECN) and Multi-Scale Encoder Module (MEM) to extract multi-scale rotated embedding features of overhead fisheye image for Transformer, then we propose an embedding optimization loss to improve the diversity of these features. Finally, we use a Decoder Module (DM) to decode the rotated bounding boxes’information from embedding features. Extensive experiments conducted on three benchmark fisheye camera datasets demonstrate that the proposed method achieves the state of the art.
What problem does this paper attempt to address?