Abstract:We present a novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged (i.e., evaluated in a zero-shot fashion on the dataset) baselines for a time horizon of 60s.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to predict human trajectories for up to 60 seconds in complex indoor environments. Specifically, existing human trajectory prediction methods mainly focus on collision avoidance and short - term planning, which limits their applications in complex environments, especially when human - environment interactions need to be considered. These methods are usually unable to effectively model the complex interactions between humans and the environment, especially over a long time range (such as more than 60 seconds). Therefore, this paper proposes a new method to overcome the limitations of existing methods by predicting the interaction sequences between humans and the environment and using this information to guide trajectory prediction. ### Main contributions of the paper: 1. **Research on long - term human trajectory prediction**: For the first time, study human trajectory prediction for up to 60 seconds in complex indoor environments, including complex human - object interactions. 2. **Propose the LP2 method**: Combine zero - shot interaction sequence prediction of large - language models (LLMs) and probability trajectory prediction based on continuous - time Markov chains (CTMC) to infer the multi - modal spatio - temporal distribution of future human positions. 3. **Introduce a new dataset**: Create a new semi - synthetic dataset containing 76 long - term human trajectories (each about 3 minutes) and label them in two semantically rich environments. ### Method overview: 1. **Environment representation**: Use 3D dynamic scene graphs (DSGs) to represent the environment, which provide a hierarchical representation of the geometry, semantics, and traversability of the scene. 2. **Interaction sequence prediction (ISP)**: Utilize LLMs to predict future interaction sequences based on the rich context information of the scene. 3. **Probability trajectory prediction (PTP)**: Transform these interaction sequences into coherent trajectories and predict the continuous spatio - temporal probability distribution of future human positions. ### Experimental results: - **Performance evaluation**: Evaluated on multiple metrics, including negative log - likelihood (NLL) and average displacement error of the best 20 trajectories (Bo20 ADE). The results show that the LP2 method significantly outperforms other baseline methods within the 60 - second time range. - **Dataset characteristics**: The dataset contains a variety of scenarios, including trajectories starting from interactions and trajectories starting during movement, ensuring diversity and realism. ### Conclusion: The method proposed in this paper has made significant progress in long - term human trajectory prediction in complex indoor environments, especially in dealing with complex human - environment interactions. By combining LLMs and CTMC, the LP2 method can more accurately predict future human trajectories, providing strong support for robots' long - term planning and proactive behaviors in complex social environments.

Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs

Long-Term Human Motion Prediction with Scene Context

Predicting Long-Term Human Behaviors in Discrete Representations via Physics-Guided Diffusion

Indoor 3D Human Trajectory Reconstruction Using Surveillance Camera Videos and Point Clouds

Social LSTM: Human Trajectory Prediction in Crowded Spaces

An Efficient Spatial-Temporal Model Based on Gated Linear Units for Trajectory Prediction

Long-Short Term Spatio-Temporal Aggregation for Trajectory Prediction

Scene-LSTM: A Model for Human Trajectory Prediction

Human Trajectory Prediction using Spatially aware Deep Attention Models

TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction

Adaptive Human Trajectory Prediction via Latent Corridors

FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations

SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model

A Data-Efficient Approach for Long-Term Human Motion Prediction Using Maps of Dynamics

Human trajectory prediction and generation using LSTM models and GANs

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

Human Trajectory Prediction Using Stacked Temporal Convolutional Network

Human Trajectory Prediction via Counterfactual Analysis