Abstract:RNN-based approaches have achieved outstanding performance on action recognition with skeleton inputs. Currently these methods limit their inputs to coordinates of joints and improve the accuracy mainly by extending RNN models to spatial domains in various ways. While such models explore relations between different parts directly from joint coordinates, we provide a simple universal spatial modeling method perpendicular to the RNN model enhancement. Specifically, we select a set of simple geometric features, motivated by the evolution of previous work. With experiments on a 3-layer LSTM framework, we observe that the geometric relational features based on distances between joints and selected lines outperform other features and achieve state-of-art results on four datasets. Further, we show the sparsity of input gate weights in the first LSTM layer trained by geometric features and demonstrate that utilizing joint-line distances as input require less data for training.

Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks