Human Activity Recognition Based on Multi-Modal Fusion

Cheng Zhang,Tianqi Zu,Yibin Hou,Jian He,Shengqi Yang,Ruihai Dong
DOI: https://doi.org/10.1007/s42486-023-00132-x
2023-01-01
CCF Transactions on Pervasive Computing and Interaction
Abstract:In recent years, human activity recognition (HAR) methods are developing rapidly. However, most existing methods base on single input data modality, and suffers from accuracy and robustness issues. In this paper, we present a novel multi-modal HAR architecture which fuses signals from both RGB visual data and Inertial Measurement Units (IMU) data. As for the RGB modality, the speed-weighted star RGB representation is proposed to aggregate the temporal information, and a convolutional network is employed to extract features; As for the IMU modality, Fast Fourier transform and multi-layer perceptron are employed to extract the dynamical features of IMU data. As for the feature fusion scheme, the global soft attention layer is designed to adjust the weights according to the concatenated features, and the L-softmax with soft voting is adopted to classify activities. The proposed method is evaluated on the UP-Fall dataset, the F1-scores are 0.92 and 1.00 for 11 classes classification task and fall/non-fall binary classification task respectively.
What problem does this paper attempt to address?