Multi-Domain and Multi-Task Learning for Human Action Recognition

An-An Liu,Ning Xu,Wei-Zhi Nie,Yu-Ting Su,Yong-Dong Zhang
DOI: https://doi.org/10.1109/tip.2018.2872879
IF: 10.6
2019-01-01
IEEE Transactions on Image Processing
Abstract:Domain-invariant (view-invariant and modality-invariant) feature representation is essential for human action recognition. Moreover, given a discriminative visual representation, it is critical to discover the latent correlations among multiple actions in order to facilitate action modeling. To address these problems, we propose a multi-domain and multi-task learning (MDMTL) method to: 1) extract domain-invariant information for multi-view and multi-modal action representation and 2) explore the relatedness among multiple action categories. Specifically, we present a sparse transfer learning-based method to co-embed multi-domain (multi-view and multi-modality) data into a single common space for discriminative feature learning. Additionally, visual feature learning is incorporated into the multi-task learning framework, with the Frobenius-norm regularization term and the sparse constraint term, for joint task modeling and task relatedness-induced feature learning. To the best of our knowledge, MDMTL is the first supervised framework to jointly realize domain-invariant feature learning and task modeling for multi-domain action recognition. Experiments conducted on the INRIA Xmas Motion Acquisition Sequences data set, the MSR Daily Activity 3D (DailyActivity3D) data set, and the Multi-modal & Multi-view & Interactive data set, which is the most recent and largest multi-view and multi-model action recognition data set, demonstrate the superiority of MDMTL over the state-of-the-art approaches.
What problem does this paper attempt to address?