Flow-based Spatio-Temporal Structured Prediction of Motion Dynamics

Mohsen Zand,Ali Etemad,Michael Greenspan
DOI: https://doi.org/10.1109/TPAMI.2023.3296446
2023-09-05
Abstract:Conditional Normalizing Flows (CNFs) are flexible generative models capable of representing complicated distributions with high dimensionality and large interdimensional correlations, making them appealing for structured output learning. Their effectiveness in modelling multivariates spatio-temporal structured data has yet to be completely investigated. We propose MotionFlow as a novel normalizing flows approach that autoregressively conditions the output distributions on the spatio-temporal input features. It combines deterministic and stochastic representations with CNFs to create a probabilistic neural generative approach that can model the variability seen in high dimensional structured spatio-temporal data. We specifically propose to use conditional priors to factorize the latent space for the time dependent modeling. We also exploit the use of masked convolutions as autoregressive conditionals in CNFs. As a result, our method is able to define arbitrarily expressive output probability distributions under temporal dynamics in multivariate prediction tasks. We apply our method to different tasks, including trajectory prediction, motion prediction, time series forecasting, and binary segmentation, and demonstrate that our model is able to leverage normalizing flows to learn complicated time dependent conditional distributions.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to learn spatio - temporal relationships in dynamic systems, especially in the application of predicting complex trajectories. Specifically, existing methods face challenges when dealing with stochastic sequence processes with high uncertainty, especially in automatically modeling dynamic relationships. These methods are usually difficult to handle a wide range of changes in potential outputs, resulting in over - smoothed prediction results, that is, as time progresses, the estimated trajectories tend to the average position without clear movement. In addition, methods based on Generative Adversarial Networks (GANs) or Variational Auto - Encoders (VAEs) can learn spatial information, but perform poorly when dealing with overly complex structures, and are difficult to train, and are prone to problems such as mode collapse, posterior collapse or vanishing gradients. To overcome these problems, this paper proposes a flow - based structured prediction model - MotionFlow, which aims to learn spatio - temporal relationships in dynamic systems. MotionFlow creates a probabilistic neural generation method by combining deterministic and stochastic representations through Conditional Normalizing Flows (CNFs), which can model changes in high - dimensional structured spatio - temporal data. The model utilizes conditional priors to decompose the latent space for time - dependent modeling, and uses masked convolutions as autoregressive conditions in CNFs, thereby being able to define arbitrary expressive output probability distributions that change dynamically over time in multivariate prediction tasks. In summary, the main objective of this paper is to develop a new method to predict the future states of dynamic systems in a more accurate and consistent manner, especially in tasks such as trajectory prediction, action prediction, time - series prediction and binary segmentation. By extending the application of normalizing flows to spatio - temporal structured prediction tasks, MotionFlow demonstrates its superior performance in learning complex time - dependent conditional distributions.