Abstract:Autonomous racing has rapidly gained research attention. Traditionally, racing cars rely on 2D LiDAR as their primary visual system. In this work, we explore the integration of an event camera with the existing system to provide enhanced temporal information. Our goal is to fuse the 2D LiDAR data with event data in an end-to-end learning framework for steering prediction, which is crucial for autonomous racing. To the best of our knowledge, this is the first study addressing this challenging research topic. We start by creating a multisensor dataset specifically for steering prediction. Using this dataset, we establish a benchmark by evaluating various SOTA fusion methods. Our observations reveal that existing methods often incur substantial computational costs. To address this, we apply low-rank techniques to propose a novel, efficient, and effective fusion design. We introduce a new fusion learning policy to guide the fusion process, enhancing robustness against misalignment. Our fusion architecture provides better steering prediction than LiDAR alone, significantly reducing the RMSE from 7.72 to 1.28. Compared to the second-best fusion method, our work represents only 11% of the learnable parameters while achieving better accuracy. The source code, dataset, and benchmark will be released to promote future research.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **the steering angle prediction problem in autonomous racing cars**, especially on the F1tenth prototype car. Specifically, traditional methods mainly rely on 2D LiDAR as the main vision system, but this method has the following limitations: 1. **Insufficient spatial perception**: 2D LiDAR is only sensitive to depth changes and lacks spatial perception on the Y - axis and Z - axis. 2. **Lack of temporal cues**: 2D LiDAR cannot provide sufficient dynamic information, resulting in perception delays easily in high - speed dynamic environments, which affects the vehicle's ability to make quick decisions. To solve these problems, the author proposes a new multi - sensor fusion method, combining 2D LiDAR with an event camera to enhance the accuracy and real - time performance of steering prediction. The following are the main contributions of this study: - **Creation of a multi - sensor dataset**: A multi - sensor dataset was specifically created for steering prediction to evaluate different fusion methods. - **Application of low - rank techniques**: To reduce the computational cost of existing fusion methods, low - rank techniques were introduced and a new, efficient and effective fusion architecture was designed. - **New fusion learning strategy**: A new fusion learning strategy was proposed. By maximizing the joint entropy between the two sensor inputs, the robustness of the fusion process was improved, especially in cases of poor sensor alignment. - **Significant error reduction**: Compared with using only 2D LiDAR, the new method significantly reduces the RMSE from 7.72 to 1.28, and the number of parameters is only 11% of that of the sub - optimal fusion method, while achieving better accuracy. Through these improvements, this study not only improves the performance of steering prediction in autonomous racing cars but also provides valuable benchmarks and datasets for future research. ### Formula Summary - **Event stream definition**: \[ \varepsilon=\{e_i|e_i = ((x_i,y_i),t_i,p_i),t_i\in[t_{\text{start}},t_{\text{end}}]\} \] where \(e_i\) represents a single event, \((x,y)\) are pixel coordinates, \(t\) is a timestamp, and \(p\in\{+ 1,-1\}\) represents the polarity of the brightness change. - **Projection model**: \[ p_{\text{image}}=K[R|t]P_{\text{LiDAR}} \] where \(P_{\text{LiDAR}}\) is a 3D point in the LiDAR coordinate system, \(R\in\mathbb{R}^{3\times3}\) is a rotation matrix, \(t\in\mathbb{R}^{3\times1}\) is a translation vector, and \(K\in\mathbb{R}^{3\times3}\) is the internal parameter matrix of the event camera. - **Similarity loss**: \[ L_{\text{div}}=L_{\text{KL}}(f_S,f_D)+L_{\text{KL}}(f_S,f_E) \] \[ L_{\text{KL}}(A,B)=\text{KL}(A||B)+\text{KL}(B||A) \] - **Overall loss function**: \[ L = \lambda\cdot L_{\text{div}}+L_2 \] where \(\lambda\) is a hyperparameter, set to 0.25. - **Root - mean - square error (RMSE)**: \[ \text{RMSE}=\sqrt{\frac{1}{N}\sum_{i = 1}^N(y_i-\hat{y}_i)^2} \] - **Mean absolute error (MAE)**: \[ \text{MAE}=\frac{1}{N}\sum_{i = 1}

Steering Prediction via a Multi-Sensor System for Autonomous Racing

A Fusion Method Aiming at Environmental Perception of Autonomous Vehicle Based on Visual Scheme

Single-Camera and Inter-Camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features

Multi-Modal Sensor Fusion and Object Tracking for Autonomous Racing

Radar and Camera Fusion for Multi-Task Sensing in Autonomous Driving

Learning End-to-End Autonomous Steering Model from Spatial and Temporal Visual Cues

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

Accurate, Low-Latency Visual Perception for Autonomous Racing:Challenges, Mechanisms, and Practical Solutions

Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving

RPRP-SAP: A Robust and Precise ResNet Predictor for Steering Angle Prediction of Autonomous Vehicles

Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars

DeepRacing: Parameterized Trajectories for Autonomous Racing

Real-time depth completion based on LiDAR-stereo for autonomous driving

3D Multiple Object Tracking with Multi-modal Fusion of Low-cost Sensors for Autonomous Driving.

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

3D object detection and state estimation method based on stereo vision and LIDAR fusion

Radar Camera Fusion via Representation Learning in Autonomous Driving

Predictive Spliner: Data-Driven Overtaking in Autonomous Racing Using Opponent Trajectory Prediction

Multi-Camera Object Fusion Tracking Model for Autonomous Driving.

Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering