A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Yixiu Yuan,Baicun Wang,Ruirui Zhong,Bingtao Hu
DOI: https://doi.org/10.1109/icnsc62968.2024.10759898
2024-01-01
Abstract:Efficient human-robot collaboration necessitates bidirectional perception between humans and robots. For robots, understanding the operator's behavior is crucial for enhancing safety and work efficiency in human-robot collaboration. Human limb movements encompass diverse behavioral information, and accurate human motion prediction has attracted significant attention. Existing human motion prediction methods can extract some behavioral features, but they fall short of fully capturing the dynamic and complex interactions across different time points and body joints in human motion sequences. In this study, a Spatio-Temporal Transformer Network model (STTFN) is proposed to automatically learn the spatio-temporal dependency relationships in human motion sequence data for prediction. The spatio-temporal knowledge embedding block employs an attention mechanism and a graph attention network to extract spatio-temporal behavioral features from raw data. An encoder-decoder network based on Transformer and Long Short-Term Memory (LSTM) is constructed for further analysis to acquire motion prediction data. This study conducts an experiment utilizing a human-robot collaborative assembly dataset to validate the effectiveness of the proposed model. The results demonstrate that our model outperforms classical models, thereby advancing the field of human motion prediction.
What problem does this paper attempt to address?