Monte Carlo Tree Search for Scheduling Activity Recognition

Mohamed R. Amer,Sinisa Todorovic,Alan Fern,Song-Chun Zhu
DOI: https://doi.org/10.1109/iccv.2013.171
2013-01-01
Abstract:This paper addresses recognition of human activities with stochastic structure, characterized by variable space-time arrangements of primitive actions, and conducted by a variable number of actors. Our approach classifies the activity of interest as well as identifies the relevant foreground in the video. Each activity representation is considered as a mixture distribution of BoWs captured by a Sum-Product Network (SPN). In our approach, SPN represents a linear mixture of many bags-of-words (BoWs) where each BoW represents an important foreground part of the activity. This mixture distribution is efficiently computed by organizing the BoWs in a hierarchy, where children BoWs are nested within parent BoWs. SPN allows us to model this mixture since it consists of terminal nodes representing BoWs, product nodes, and sum nodes organized in a number of layers. The products are aimed at encoding particular configurations of primitive actions, and the sums serve to capture their alternative configurations. SPN inference amounts to parsing the SPN graph, which yields the most probable explanation (MPE) of the video foreground. SPN inference has linear complexity in the number of nodes, under fairly general conditions, enabling fast and scalable recognition. The connectivity of SPN and the parameters of BoW distributions are learned under weak supervision using a variational EM algorithm. For our evaluation, we have compiled and annotated a new Volleyball dataset. Our classification accuracy and localization results are superior to those of the state of the art on current benchmarks as well as our Volleyball datasets.
What problem does this paper attempt to address?