Pedestrian behavior prediction model with a convolutional LSTM encoder–decoder

Kai Chen,Xiao Song,Daolin Han,Jinghan Sun,Yong Cui,Xiaoxiang Ren
DOI: https://doi.org/10.1016/j.physa.2020.125132
2020-12-01
Abstract:<p>Pedestrian behavior modelling is a challenging problem especially in crowded transportation scenarios. Some recent studies have addressed this problem using deep neural network, but the accuracy of trajectory prediction is still not high because the internal structure of the typical deep neural network with long short-term memory (LSTM) is a one-dimensional vector, which destroys the spatial information around a pedestrian. Therefore, these models cannot fully learn spatial sensing behavior of pedestrians. To solve this, we recommend using multi-channel tensors to represent the environmental information of pedestrians. Meanwhile, the spatiotemporal interactions among the pedestrians are represented by convolution operations of these tensors. Then, an end-to-end fully convolutional LSTM encoder–decoder is designed, trained and tested. Finally, our approach is compared with existing LSTM-based methods using five crowded video sequences with public datasets. The results show that our method reduces the displacement offset error and provides more realistic trajectory prediction in manifold cases.</p>
physics, multidisciplinary
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the challenging problems in pedestrian behavior prediction, especially in crowded traffic scenarios. Specifically, although the existing pedestrian trajectory prediction methods based on deep neural networks have made certain progress, their accuracy is still not high. The main reason is that the internal structure of the traditional Long - Short - Term Memory network (LSTM) is a one - dimensional vector, which destroys the spatio - temporal information around pedestrians. Therefore, these models cannot fully learn the spatial - aware behaviors of pedestrians. To overcome this problem, this paper proposes to use multi - channel tensors to represent pedestrian environmental information and extract the spatial features of each pedestrian through convolution operations. On this basis, an end - to - end fully - convolutional LSTM encoder - decoder network is designed to predict the trajectories of pedestrians. Experimental results show that this method reduces the displacement error on multiple public datasets and provides more realistic trajectory predictions. ### Key technologies and methods 1. **Multi - channel tensor representation**: - Use multi - channel tensors to represent the environmental information around pedestrians, including the position of pedestrians, speed, the position of obstacles and the spatial information of the entire scene. - Extract the spatial features of each pedestrian through convolution operations to ensure that the information input into the model retains the spatial characteristics. 2. **Fully - convolutional LSTM encoder - decoder network**: - Design an end - to - end fully - convolutional LSTM encoder - decoder network to predict the trajectories of pedestrians. - The encoder part generates a high - level representation of the pedestrian's historical trajectory through stacked Conv - LSTM cells. - The decoder part further learns the high - level representation generated by the encoder and outputs the position of the pedestrian at future moments. 3. **Loss function and training**: - Use the Smooth - L1 loss function to reduce the influence of outliers on the loss value. - Use the Adam optimization algorithm during the training process to calculate the first - order and second - order moment estimates of the gradient and calculate independent adaptive learning rates for different parameters. ### Experimental results - **Dataset**: Experiments are carried out on two public datasets, ETH and UCY. These two datasets contain 5 crowded scenes with a total of 1,536 pedestrians, showing complex interaction behaviors. - **Evaluation metrics**: - Average Displacement Error (ADE): Calculate the average Euclidean distance of all estimated points at each time step between the predicted trajectory and the real trajectory. - Final Displacement Error (FDE): Calculate the average Euclidean distance between the final predicted position at the end of the prediction and the real position. - **Experimental results**: - The model proposed in this paper outperforms the existing LSTM - and CNN - based methods in multiple metrics, especially in terms of prediction accuracy and computational efficiency. ### Conclusion The method proposed in this paper effectively solves the problem of spatial information loss in pedestrian trajectory prediction by using multi - channel tensors and a fully - convolutional LSTM encoder - decoder network, and improves the accuracy and authenticity of prediction. The experimental results verify the effectiveness of this method.