Abstract:Autonomous driving holds great promise in addressing traffic safety concerns by leveraging artificial intelligence and sensor technology. Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of Multi-Object Tracking in autonomous driving systems. The proposed method leverages a Bi-directional Long Short-Term Memory network to incorporate long-term temporal information and improve motion prediction. An appearance feature model inspired by FaceNet is used to establish associations between objects across different frames, ensuring consistent tracking. A tri-output mechanism is employed, consisting of individual outputs for radar and camera sensors and a fusion output, to provide robustness against sensor failures and produce accurate tracking results. Through extensive evaluations of real-world datasets, our approach demonstrates remarkable improvements in tracking accuracy, ensuring reliable performance even in low-visibility scenarios.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and robustness of multi - object tracking (MOT) in an autonomous driving system by fusing millimeter - wave radar (mmWave Radar) and camera sensor data. Specifically, the paper proposes solutions to the following challenges:
1. **Limitations of a single sensor**: Most MOT systems currently rely on visual data, which is prone to tracking failure when facing object occlusion, sudden appearance changes or reflection interference. For example, when an object is occluded by other objects or the ambient light changes, relying solely on camera data may not be able to continuously track the target.
2. **Insufficient utilization of long - time - series information**: Traditional methods based on Bayesian filtering (such as the Kalman filter) usually overlook the importance of long - time - series information, which may lead to inaccurate motion prediction in complex traffic scenarios.
3. **Identity switching problem**: In multi - object tracking, the re - identification of targets between different frames is a key issue. If the associations between different frames cannot be accurately established, it may lead to target identity confusion, that is, the identity switching problem.
To solve the above problems, the paper proposes a deep - learning - based multi - object tracking method. The main innovations include:
- **Fusing radar and camera data**: By combining the advantages of radar and camera, the robustness and accuracy of the system are improved. The radar performs well in bad weather and low - light conditions, while the camera can provide rich appearance information.
- **Using a Bi - LSTM network**: A bidirectional long - short - term memory network (Bi - LSTM) is introduced to integrate long - time - series information and improve the accuracy of motion prediction. Bi - LSTM can consider both past and future context information simultaneously, thus better capturing the motion patterns of targets.
- **FaceNet - inspired appearance feature model**: A deep - learning model similar to FaceNet is used to extract the appearance features of targets, reducing the identity switching problem. By calculating the appearance feature distances between different targets, the associations between different frames can be more accurately established.
- **Three - output mechanism**: A three - output structure is designed to output the results of the radar, the camera and the fusion respectively. This mechanism can not only compensate for the failure or temporary failure of a single sensor, but also further improve the tracking accuracy by fusing complementary information.
Through these innovations, the paper aims to provide a more reliable and accurate multi - object tracking method, especially in complex and low - visibility traffic scenarios.