Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers

Antonyo Musabini,Ivan Novikov,Sana Soula,Christel Leonet,Lihao Wang,Rachid Benmokhtar,Fabian Burger,Thomas Boulay,Xavier Perrotton

2024-09-30

Abstract:Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range, relying on error-prone homographic projection for both labeling and inference. However, recent advancements in Advanced Driver Assistance System (ADAS) require interaction with end-users through comprehensive and intelligent Human-Machine Interfaces (HMIs). These interfaces should present a complete perception of the parking area going from distinguishing vacant slots' entry lines to the orientation of other parked vehicles. This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT), which leverages features from a four-camera fisheye Surround-view Camera System (SVCS) with multihead attentions to create a detailed Bird-Eye View (BEV) grid feature map. Features are processed by both a segmentation decoder and a Polygon-Yolo based object detection decoder for parking slots and vehicles. Trained on data labeled using LiDAR, MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm. Our larger model achieves an F-1 score of 0.89. Moreover the smaller model operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar detection results to the larger one. MT F-CVT demonstrates robust generalization capability across different vehicles and camera rig configurations. A demo video from an unseen vehicle and camera rig is available at: <a class="link-external link-https" href="https://streamable.com/jjw54x" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the issue in parking area perception where existing algorithms mainly rely on homographic projection, which has significant errors, leading to limited detection range and difficulty in accurately identifying the positions and directions of vacant parking spaces and surrounding vehicles. Additionally, current advanced driver assistance systems (ADAS) require more comprehensive and intelligent human-machine interaction (HMI) with users. These interfaces should provide complete parking area perception, including distinguishing the entrance lines of vacant parking spaces and the directions of surrounding vehicles. Specifically, the paper proposes a Multi-Task Fisheye Cross-View Transformer (MT F-CVT), which utilizes the features of a four-camera fisheye surround-view camera system (SVCS) and a multi-head attention mechanism to generate detailed bird's-eye view (BEV) feature maps. The model detects parking spaces and vehicles through a segmentation decoder and a Polygon-YOLO-based object detection decoder, achieving an average positioning error of 20 centimeters in a 25-meter × 25-meter real open road scenario. Furthermore, the model runs at 16 frames per second on the Nvidia Jetson Orin embedded board, demonstrating strong generalization ability and adaptability to different vehicles and camera configurations.

Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers

FisheyeMultiNet: Real-time Multi-task Learning Architecture for Surround-view Automated Parking System

Automatic Parking Based on a Bird's Eye View Vision System

Surround-View Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-Insensitive Multi-Task Framework

Parking Spot Classification based on surround view camera system

Moving Object Detection Using an In-Vehicle Fish-Eye Camera

Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

Holistic Parking Slot Detection with Polygon-Shaped Representations

Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras

VH-HFCN Based Parking Slot and Lane Markings Segmentation on Panoramic Surround View

3D visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection

Fisheye Lens Camera based Autonomous Valet Parking System

Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving

Real-time Detection, Tracking, and Classification of Moving and Stationary Objects using Multiple Fisheye Images

Deep Learning Based Video System for Accurate and Real-Time Parking Measurement

Object ground lines regression and mapping from fisheye images to around view image for the AVP

Global Perception-based Robust Parking Space Detection Using a Low-cost Camera

FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving

Visual Parking Occupancy Detection Using Extended Contextual Image Information via a Multi-Branch Output ConvNeXt Network

CMCA-YOLO: A Study on a Real-Time Object Detection Model for Parking Lot Surveillance Imagery

Spatio-Temporal Fusion of LiDAR and Camera Data for Omnidirectional Depth Perception