CIR-DFENet: Incorporating Cross-Modal Image Representation and Dual-Stream Feature Enhanced Network for Activity Recognition
Yuliang Zhao,Jinliang Shao,Xiru Lin,Tianang Sun,Jian Li,Chao Lian,Xiaoyong Lyu,Binqiang Si,Zhikun Zhan
DOI: https://doi.org/10.1016/j.eswa.2024.125912
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:Human activity recognition (HAR) based on wearable sensors has been widely used in various fields such as health monitoring, healthcare, and fitness due to its portability, accuracy, and real-time capabilities. Currently, advanced technologies involve converting time-series into images and deep learning for recognition, which addresses issues of subjectivity and dependence on data quality inherent in traditional processing methods. However, each of the current methods for converting time-series data into images only focuses on one type of feature representation such that one of these methods is insufficient to fully characterize the data, which results in lower accuracy in activity recognition. To address the above problems, we propose a novel method for cross-modal image representation of time-series and a dual-stream feature enhanced network model, which enables HAR based on a single-node wearable sensor. Firstly, three methods including Markov Transition Field (MTF), Recurrence Plot (RP), and Gramian Angular Field (GAF), are employed to encode the time-series into three channels (R, G, B) of color images, facilitating the fusion of features such as amplitude variation, nonlinearity, and local temporal relationship. Then, a multi-channel CNN with a Global Attention Mechanism (GAM) is employed in the model to process images, which captures both channel and spatial information. A network combining CNN and Long Short-Term Memory (LSTM) with Self-Attention Mechanism (SA) is utilized to process time-series, which captures temporal features. Simultaneously, the introduction of residual structures makes the network easy to train, thereby enhancing overall performance. The experimental results demonstrate that the proposed model can accurately identify six different gymnastic activities, achieving an accuracy rate of 99.40%. This study provides a new direction for processing time-series and offers better applications in the field of HAR based on wearable sensors.