Camiel Oerlemans,Bram Grooten,Michiel Braat,Alaa Alassi,Emilia Silvas,Decebal Constantin Mocanu
Abstract:Predicting the behavior of road users accurately is crucial to enable the safe operation of autonomous vehicles in urban or densely populated areas. Therefore, there has been a growing interest in time series motion prediction research, leading to significant advancements in state-of-the-art techniques in recent years. However, the potential of using LiDAR data to capture more detailed local features, such as a person's gaze or posture, remains largely unexplored. To address this, we develop a novel multimodal approach for motion prediction based on the PointNet foundation model architecture, incorporating local LiDAR features. Evaluation on the Waymo Open Dataset shows a performance improvement of 6.20% and 1.58% in minADE and mAP respectively, when integrated and compared with the previous state-of-the-art MTR. We open-source the code of our LiMTR model.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the prediction accuracy of the behaviors of different road users (such as pedestrians, cyclists and vehicles) by autonomous vehicles in urban or densely populated areas. Specifically, the author aims to improve the time - series motion prediction model by integrating LiDAR data to capture more detailed local features (such as a person's line of sight or posture).
### Problem Background
Current time - series motion prediction mainly relies on coarser - grained modal information, such as the target position, speed, acceleration and bounding box output by the object detection step, combined with accurate road information. Although this representation is efficient, it may overlook some fine - grained information of the target, such as the line - of - sight direction of pedestrians or the posture of cyclists. These fine - grained information are very important for predicting the behaviors of vulnerable road users (such as pedestrians and cyclists).
### Solution
To solve this problem, the author proposes a new multimodal method based on the PointNet architecture - LiMTR (LiDAR Motion Transformer), which directly integrates LiDAR data into the motion prediction model. LiMTR is implemented in the following ways:
1. **Local LiDAR Feature Extraction**: Only use the subset of LiDAR point clouds related to the target road users, helping the model focus on the specific features of the target (such as a person's posture or line - of - sight direction).
2. **LiDAR Encoder Design**: Design a LiDAR encoder based on the PointNet architecture, which can directly process LiDAR point cloud data without voxelization or other complex pre - processing steps.
3. **Performance Improvement**: Experiments on the Waymo Open Dataset show that LiMTR has improved by 6.20% and 1.58% in terms of minimum average displacement error (minADE) and mean average precision (mAP) respectively, especially when predicting vulnerable road users (such as pedestrians and cyclists).
### Conclusion
By introducing local LiDAR features, LiMTR significantly improves the prediction accuracy of the future trajectories of different road users, especially for vulnerable road users such as pedestrians and cyclists. This enables autonomous vehicles to operate more safely in complex urban environments.
### Formula Representation
- Minimum Average Displacement Error (minADE):
\[
\text{minADE}=\min_{i = 1}^{m}\frac{1}{T}\sum_{t = 1}^{T}\|\mathbf{x}_t^{\text{pred},i}-\mathbf{x}_t^{\text{gt}}\|_2
\]
where \(\mathbf{x}_t^{\text{pred},i}\) is the position of the \(i\)-th predicted trajectory at time \(t\), \(\mathbf{x}_t^{\text{gt}}\) is the position of the real trajectory at time \(t\), \(T\) is the number of time steps, and \(m\) is the number of predicted trajectories.
- Mean Average Precision (mAP):
\[
\text{mAP}=\frac{1}{C}\sum_{c = 1}^{C}\text{AP}_c
\]
where \(C\) is the number of road user categories (such as pedestrians, cyclists, vehicles), and \(\text{AP}_c\) is the average precision of the \(c\)-th category.
Through these improvements, LiMTR provides more reliable support for the safe operation of autonomous vehicles.