MLDT: Multi-task Learning with Denoising Transformer for Gait Identity and Emotion Recognition

Weijie Sheng,Xiaoyan Lu,Xinde Li
DOI: https://doi.org/10.1145/3508259.3508266
2021-01-01
Abstract:Dynamics of body skeletons convey significant information for human gait recognition. However, current methods for skeleton-based human gait recognition usually work with complete skeletons. If we directly feed the noisy or incomplete data without correction, the performance of our model may significantly deteriorate. This paper proposes a novel Multi-task Learning with Denoising Transformer Network (MLDT) for gait-related recognition tasks based on the pure transformer framework: Vision Transformer (ViT). With several adaptations, a reconstruction head is added parallel to the transformer encoder head to correct the missing points and outliers in joint trajectories, which can capture more discriminative spatiotemporal patterns through semi-supervised learning. Experimental results show that our model for gait-related recognition tasks is superior and promising, achieving state-of-the-art performance on identity and emotion recognition benchmarks.
What problem does this paper attempt to address?