Multihuman Tracking Based on a Spatial–Temporal Appearance Match

Yuan Shen,Zhenjiang Miao
DOI: https://doi.org/10.1109/tcsvt.2013.2280073
IF: 5.859
2014-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In this paper, we focus on the improvements of appearance representation for multihuman tracking. Many previous methods extracted low-level appearance features, such as color histogram and texture, even combined with spatial information for each frame. These methods ignore the temporal distribution of features. The features of each frame may not be stable due to illumination, human pose variation, and image noise. In order to improve it, we propose a novel appearance representation called the spatial-temporal appearance model based on the statistical distribution of Gaussian mixture model (GMM). It represents the appearance of a tracklet as a whole with dynamic spatial and temporal information. The spatial information is the dynamic subregions. The temporal information is the dynamic duration time of each subregion. Each subregion is modeled as the weighted Gaussian distribution of GMM. The online expectation-maximization (online EM) algorithm is used to estimate the parameters of GMM. Then, we propose a tracklet association method using Bayesian prediction and Jensen-Shannon divergence. The Bayesian prediction is used to predict the locations of targets. The Jensen-Shannon divergence is used to compute the distance of spatial-temporal appearance distribution between two tracklets. Finally, we test our approach on four challenging datasets (TRECVID, CAVIAR, ETH, and EPFL Terrace) and achieve good results.
What problem does this paper attempt to address?