Combined Trajectories for Action Recognition Based on Saliency Detection and Motion Boundary.
Xiaofang Wang,Chun Qi,Fei Lin
DOI: https://doi.org/10.1016/j.image.2017.05.007
IF: 3.453
2017-01-01
Signal Processing Image Communication
Abstract:To exploit the trajectories from different areas of a video in an effective way to represent action, this paper proposes to extract the trajectories of action-related areas, the trajectories of action-related motion boundaries and the dense trajectories independently, and then concatenate the representations of them to obtain the final representation of the video. The key to extract the former two sets of trajectories is to detect the action-related areas in each frame at first. We fulfill this task by applying sparse representation to the motion of the subvideo centered at current frame on patch level. To this end, we spatially divide the subvideo into patches. For each patch, we learn a weighted sparse representation of its motion vector using the dictionary constructed by the motion vectors of all the rest patches, and then use the reconstruction error to measure patch saliency. Based on the saliency of all patches in a frame, a saliency map is obtained to indicate the action-related areas, which on one hand is incorporated into dense tracking to extract the trajectories of action-related areas, and on the other hand is used as a mask to filter out the background motion boundaries so that the action-related motion boundary trajectories are derived. The experiments on four benchmark datasets, namely, Hollywood2, YouTube, HMDB51 and UCF101, demonstrate the effectiveness of our method.