Imitative Learning for Multi-Person Action Forecasting.

Yu-Ke Li,Pin Wang,Mang Ye,Ching-Yao Chan
DOI: https://doi.org/10.1145/3474085.3475187
2021-01-01
Abstract:Multi-person action forecasting is an emerging task and a pivotal step towards video understanding. The major challenge lies in estimating a distribution characterizing the upcoming actions of all individuals in the scene. The state-of-the-art solutions attempt to solve this problem via a step-by-step prediction procedure. However, they are not adequate to address some particular limitations, such as the compounding errors, the innate uncertainty of the future and the spatio-temporal contexts. To handle the multi-person action forecasting challenges, we put forth a novel imitative learning framework upon the basis of inverse reinforcement learning. Specifically, we aim to learn a policy to model the aforementioned distribution up to a coming horizon through an objective that naturally solves the compounding errors. Such a policy is able to explore multiple plausible futures via extrapolating a series of latent variables and taking them into account to generate predictions. The impacts of these latent variables are further investigated by optimizing the directed information. Moreover, we reason the spatial context along with the temporal cue in a single pass with the usage of graph structural data. The experimental outcomes on two large-scale datasets reveal that our approach yields considerable improvements in terms of both diversity and quality with respect to recent leading studies.
What problem does this paper attempt to address?