Joint-Aware Transformer: An Inter-Joint Correlation Encoding Transformer for Short-Term 3D Human Motion Prediction

Chang Liu,Satoshi Yagi,Satoshi Yamamori,Jun Morimoto
DOI: https://doi.org/10.1109/access.2024.3484660
IF: 3.9
2024-11-01
IEEE Access
Abstract:3D Skeleton-based human motion prediction, a classic task in computer vision, aims to forecast subsequent motions based on historical motion observations. In particular, precise short-term motion prediction is crucial for the effectiveness of machines designed for real-time human-computer interaction. This study aims to achieve accurate predictions of human motion within timeframes of less than 400 milliseconds, with the goal of improving machine responsiveness and efficiency. Previous research has predominantly relied on sequence models like Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) or Transformers. Transformer-based methods have been relied on temporal forecasting ability of Transformer. In this study, instead, we introduce an alternative perspective for Transformer-based models to comprehend the structure of skeletons, named Joint-Aware Transformer (JAT). Within our model, the attention mechanism is employed to encode inter-joint correlation instead of temporal dependencies. Our approach outperformed the state-of-the-art (SOTA) model in short-term prediction on three types of human motion datasets.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?