DGT: Dynamic Graph Transformer for Enhanced Processing of Dynamic Joint Sequences in 2D Human Pose Estimation

Xianwei Zhou,Zhenfeng Li,Songsen Yu
DOI: https://doi.org/10.1109/cvidl62147.2024.10603857
2024-01-01
Abstract:In video-based 2D human pose estimation, existing Transformer models mainly focus on image features, neglecting the implicit information in skeletal connections. This oversight leads to inaccuracies in dynamic scenes and joint localization challenges in conditions like motion blur or occlusion. To tackle this, our Dynamic Graph Transformer integrates graph convolutional multi-head attention layers with a dynamic joint constraint module, leveraging Graph Neural Networks, extracts dynamic joint structures following the multi-head attention mechanism’s outcomes, improving the analysis of skeletal connections’ structural invariance. Our experiments on the Sub-JHMDB dataset reveal that our Dynamic Graph Transformer model not only outperforms traditional Transformer models in accuracy and robustness but also marks a significant $1 \%$ improvement in the PCK@0.05 metric.
What problem does this paper attempt to address?