Abstract:<p>The cyclist trajectory prediction is critical for the local path planning of autonomous vehicles. Based on the assumption that cyclist's movement is limited by its dynamics and subjected to interactions with environments, a novel LSTM based cyclist trajectory prediction model which utilizes multiple interactions with surroundings and motion feature in a unified framework is proposed. Road features describing road boundary and static obstacles are employed to address cyclist's interaction with the road. To address interactions with pedestrians, other cyclists and vehicles, object features including object attributes and relative positions are utilized. The focal attention mechanism is employed to reveal the importance of features at each time-steps. By feeding features into LSTM encoder, the movement in the next two seconds is predicted. Experiments were conducted on two datasets, and results show that the presented model outperforms the state-of-art models in most cases.</p>
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Predicting the trajectories of cyclists**, which is crucial for the local path planning of autonomous vehicles. Specifically, the authors propose a model based on LSTM (Long - Short - Term Memory network), which utilizes multiple interaction factors between cyclists and the environment, including road features, static obstacles, pedestrians, other cyclists and vehicles, etc. Through this interaction information, the model can more accurately predict the movement trajectories of cyclists within the next two seconds.
### Background and Motivation of the Paper
1. **Cyclist Safety**
- Approximately 1.24 million people die in traffic accidents every year, among which vulnerable road users (VRUs) account for 49% and cyclists account for 27%.
- Especially in under - developed and developing countries, bicycles, electric bicycles and motorcycles play an indispensable role in traffic, and this proportion is as high as 37%.
- In recent years, the safety issues of VRUs have attracted wide attention, and research mainly focuses on target detection, tracking and risk avoidance, etc.
2. **Deficiencies in Existing Research**
- Although much progress has been made in research on VRUs trajectory prediction, most of the research focuses on pedestrians, and relatively little attention is paid to cyclists.
- The movement of cyclists is restricted by bicycle dynamics and the surrounding environment, such as road shape, static obstacles, etc.
- Existing models are usually only able to make accurate predictions in a short time, and the modeling of environmental interactions is not sufficient.
### Main Contributions of the Paper
1. **Proposing a Unified Network Framework**
- Predict the trajectories of cyclists by using multiple environmental interaction information and motion information.
2. **Designing a Road Feature Encoder**
- Encode features such as road boundaries and static obstacles through the LSTM model to handle the interaction between cyclists and the road.
3. **Introducing a Focal Attention Mechanism**
- Improve the LSTM model so that it can focus on more relevant features, thereby improving the prediction accuracy.
4. **Constructing the HNU Dataset**
- Evaluate the applicability of the proposed method in the Chinese traffic environment.
### Model Architecture
1. **Motion Encoder**
- Encode the motion information of cyclists, including displacement and velocity.
2. **Road Encoder**
- Encode the relative positions of road key points to handle the interaction between cyclists and the road and static obstacles.
3. **Object Encoder**
- Encode the attributes and relative positions of surrounding objects to handle the interaction between cyclists and other traffic participants.
4. **Decoder**
- Synthesize the features extracted by the above encoders to predict the future movement trajectories of cyclists.
### Experimental Results
- The experimental results of this model on two datasets (Stanford Drone Dataset and HNU dataset) show that its performance is better than the existing advanced methods.
### Formula Summary
- **Motion Feature Encoding**
\[
e_{m,i}^t=\tanh(W_m^e\cdot M_i^t + b_m^e)
\]
\[
h_{m,i}^t = \text{LSTM}(W_m^l\cdot e_{m,i}^t + b_m^l)
\]
- **Road Feature Encoding**
\[
e_{r,i}^t=\kappa(R_i^t)
\]
\[
a_i^t=\text{norm}(\langle e_{r,i}^t, h_{m,i}^t\rangle)
\]
\[
h_{r,i}^t=\sum_{v = 1}^V a_{i,t,v}e_{r,i,t,v}
\]
- **Object Feature Encoding**
\[
h_{d,i,j}^t = W_d^l\cdot\tanh(W_d^e\cdot D_{i,j}^t + b_d^e)+b_d^l
\]
\[
h_{k,i,j}^t = W_k^l