Abstract:Human motion forecasting is an important and challenging task in many computer vision application domains. Recent work concentrates on utilizing the timing processing ability of recurrent neural networks (RNNs) to achieve smooth and reliable results in short-term prediction. However, as evidenced by previous works, RNNs suffer from error accumulation, leading to unreliable results. In this paper, we propose a simple feed-forward deep neural network for motion prediction, which takes into account temporal smoothness between frames and spatial dependencies between human body joints. We design Lightweight Multiscale Spatiotemporal Locally Connected Graph Convolutional Networks (MST-LCGCN) for Single Human Motion Forecasting to implicitly establish the spatiotemporal dependence in the process of human movement, where different scales fuse dynamically during training. The entire model is action-agnostic and follows a framework of encoder-decoder. The encoder consists of temporal GCNs (TGCNs) to capture motion features between frames and locally connected spatial GCNs (SGCNs) to extract spatial structure among joints. The decoder uses temporal convolution networks (TCNs) to maintain its extensibility for long-term prediction. Considerable experiments show that our approach outperforms previous methods on the Human3.6M and CMU Mocap datasets while only requiring much fewer parameters. Note to Practitioners—Accuracy and real-time performance are the two most significant evaluation factors for the challenge of human motion forecasting. Existing methods tend to use models with a huge amount of parameters, sacrificing operation speed to obtain a small increase in accuracy. However, in practical scenarios, the slowdown in speed makes predictions meaningless. Therefore, we propose a lightweight MST-LCGCN network to learn human action patterns over time. To obtain higher accuracy, we extract features from the spatial and temporal dimensions to contain more information; to obtain faster operation speed, we design our network while reducing unnecessary depth as much as possible. We demonstrate the advantages of our model in terms of efficiency and accuracy through extensive quantitative and qualitative experiments on two datasets. Our network will be helpful for robots to avoid obstacles in advance and compensate for network delays, and we will apply them to real life in the future.

A Shortcut Enhanced LSTM-GCN Network for Multi-Sensor Based Human Motion Tracking

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Lightweight Multiscale Spatiotemporal Locally Connected Graph Convolutional Networks for Single Human Motion Forecasting

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

TrajectoryCNN: A New Spatio-Temporal Feature Learning Network for Human Motion Prediction

Human Motion Prediction Based on Space-Time-Separable Graph Convolutional Network

AMHGCN: Adaptive multi-level hypergraph convolution network for human motion prediction

Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors

Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Novel Deep Learning Network for Gait Recognition Using Multimodal Inertial Sensors

Gesture Tracking and Recognition Algorithm for Dynamic Human Motion Using Multimodal Deep Learning

Human Motion Tracking Using 3D Image Features with a Long Short-Term Memory Mechanism Model—An Example of Forward Reaching

Continuous Estimation of Human Joint Angles From sEMG Using a Multi-Feature Temporal Convolutional Attention-Based Network

Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction

April-GCN: Adjacency Position-velocity Relationship Interaction Learning GCN for Human motion prediction

Enhanced Spatial–temporal Dynamics in Pose Forecasting Through Multi-Graph Convolution Networks

STM-GCN: a spatiotemporal multi-graph convolutional network for pedestrian trajectory prediction