Abstract:Autonomous driving holds great promise in addressing traffic safety concerns by leveraging artificial intelligence and sensor technology. Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of Multi-Object Tracking in autonomous driving systems. The proposed method leverages a Bi-directional Long Short-Term Memory network to incorporate long-term temporal information and improve motion prediction. An appearance feature model inspired by FaceNet is used to establish associations between objects across different frames, ensuring consistent tracking. A tri-output mechanism is employed, consisting of individual outputs for radar and camera sensors and a fusion output, to provide robustness against sensor failures and produce accurate tracking results. Through extensive evaluations of real-world datasets, our approach demonstrates remarkable improvements in tracking accuracy, ensuring reliable performance even in low-visibility scenarios.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy and robustness of multi - object tracking (MOT) in an autonomous driving system by fusing millimeter - wave radar (mmWave Radar) and camera sensor data. Specifically, the paper proposes solutions to the following challenges: 1. **Limitations of a single sensor**: Most MOT systems currently rely on visual data, which is prone to tracking failure when facing object occlusion, sudden appearance changes or reflection interference. For example, when an object is occluded by other objects or the ambient light changes, relying solely on camera data may not be able to continuously track the target. 2. **Insufficient utilization of long - time - series information**: Traditional methods based on Bayesian filtering (such as the Kalman filter) usually overlook the importance of long - time - series information, which may lead to inaccurate motion prediction in complex traffic scenarios. 3. **Identity switching problem**: In multi - object tracking, the re - identification of targets between different frames is a key issue. If the associations between different frames cannot be accurately established, it may lead to target identity confusion, that is, the identity switching problem. To solve the above problems, the paper proposes a deep - learning - based multi - object tracking method. The main innovations include: - **Fusing radar and camera data**: By combining the advantages of radar and camera, the robustness and accuracy of the system are improved. The radar performs well in bad weather and low - light conditions, while the camera can provide rich appearance information. - **Using a Bi - LSTM network**: A bidirectional long - short - term memory network (Bi - LSTM) is introduced to integrate long - time - series information and improve the accuracy of motion prediction. Bi - LSTM can consider both past and future context information simultaneously, thus better capturing the motion patterns of targets. - **FaceNet - inspired appearance feature model**: A deep - learning model similar to FaceNet is used to extract the appearance features of targets, reducing the identity switching problem. By calculating the appearance feature distances between different targets, the associations between different frames can be more accurately established. - **Three - output mechanism**: A three - output structure is designed to output the results of the radar, the camera and the fusion respectively. This mechanism can not only compensate for the failure or temporary failure of a single sensor, but also further improve the tracking accuracy by fusing complementary information. Through these innovations, the paper aims to provide a more reliable and accurate multi - object tracking method, especially in complex and low - visibility traffic scenarios.

Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors

Online Multi-Object Tracking from A Bird's-Eye View by Fusion of Millimeter-Wave Radar and Vision

Online Multipedestrian Tracking Based on Fused Detections of Millimeter Wave Radar and Vision

Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar

Robust Detection and Tracking Method for Moving Object Based on Radar and Camera Data Fusion

3D Multiple Object Tracking with Multi-modal Fusion of Low-cost Sensors for Autonomous Driving.

3D Multi-Object Tracking Based on Radar-Camera Fusion

Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving

Radar and Camera Fusion for Multi-Task Sensing in Autonomous Driving

Object Detection Using Multi-Sensor Fusion Based on Deep Learning

Multi-Camera Object Fusion Tracking Model for Autonomous Driving.

3D Multiple Extended Object Tracking by Fusing Roadside Radar and Camera Sensors

Deep Learning Derived Object Detection and Tracking Technology Based on Sensor Fusion of Millimeter-Wave Radar/Video and Its Application on Embedded Systems

A Novel Multi-Sensor Fusion Based Object Detection and Recognition Algorithm for Intelligent Assisted Driving

Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

Real time object detection using LiDAR and camera fusion for autonomous driving

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar

Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative study

A Multi-object Detection and Tracking Method Based on the Fusion of Lidar and Camera

Deep LiDAR-Radar-Visual Fusion for Object Detection in Urban Environments