A Multi-Task Deep Learning Approach for Sensor-based Human Activity Recognition and Segmentation

Furong Duan,Tao Zhu,Jinqiang Wang,Liming Chen,Huansheng Ning,Yaping Wan
2023-03-20
Abstract:Sensor-based human activity segmentation and recognition are two important and challenging problems in many real-world applications and they have drawn increasing attention from the deep learning community in recent years. Most of the existing deep learning works were designed based on pre-segmented sensor streams and they have treated activity segmentation and recognition as two separate tasks. In practice, performing data stream segmentation is very challenging. We believe that both activity segmentation and recognition may convey unique information which can complement each other to improve the performance of the two tasks. In this paper, we firstly proposes a new multitask deep neural network to solve the two tasks simultaneously. The proposed neural network adopts selective convolution and features multiscale windows to segment activities of long or short time durations. First, multiple windows of different scales are generated to center on each unit of the feature sequence. Then, the model is trained to predict, for each window, the activity class and the offset to the true activity boundaries. Finally, overlapping windows are filtered out by non-maximum suppression, and adjacent windows of the same activity are concatenated to complete the segmentation task. Extensive experiments were conducted on eight popular benchmarking datasets, and the results show that our proposed method outperforms the state-of-the-art methods both for activity recognition and segmentation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two important and challenging issues: sensor - based human Activity Recognition (AR) and Activity Segmentation (AS). Specifically: 1. **Limitations of existing methods**: - Most of the existing deep - learning works are designed based on pre - segmented sensor data streams, regarding activity segmentation and recognition as two independent tasks. - In practical applications, it is very challenging to perform data stream segmentation, especially in real - time applications. - If data segmentation is used as a pre - processing step, segmentation errors may propagate to subsequent steps, affecting the overall performance. 2. **Proposed solutions to new problems**: - The author believes that activity segmentation and recognition can complement each other and convey unique information, thereby improving the performance of both tasks. - The paper proposes a new multi - task deep neural network framework (MTHARS), aiming to solve the activity segmentation and recognition problems simultaneously. - This framework adopts the methods of selective convolution and multi - scale windows to handle activities of different time lengths. 3. **Specific objectives**: - **Effectively combine activity segmentation and recognition**: Through the multi - task learning framework, perform activity segmentation and recognition simultaneously to improve the overall performance. - **Handle dynamic activity lengths**: Propose a multi - scale window splicing method to adapt to activities of different lengths. - **Verify model performance**: Experiments were carried out on eight popular data sets, and the results show that MTHARS is superior to existing methods, and the influence of key factors was analyzed through ablation studies. ### Summary of mathematical formulas - **Offset calculation**: \[ f_x=\frac{t_x - w_x}{w_l}\quad(1) \] \[ f_l = \log\left(\frac{t_l}{w_l}\right)\quad(2) \] - **Predict activity boundaries**: \[ \hat{t}_x=f_xw_l + w_x\quad(3) \] \[ \hat{t}_l=w_l\exp(f_l)\quad(4) \] - **Loss function**: \[ L_{loc}(f,\hat{f})=\sum_{i\in\{x, l\}}\text{Smooth L1}(f_i-\hat{f}_i)\quad(5) \] where, \[ \text{Smooth L1}(x)= \begin{cases} 0.5(x)^2&\text{if }|x|<1\\ |x|- 0.5&\text{otherwise} \end{cases}\quad(6) \] \[ L_{conf}(a,\hat{a})=-\sum_{i = 1}^{n}a_i\log(\hat{a}_i)\quad(7) \] \[ L(a,\hat{a},f,\hat{f})=\frac{1}{N}(\alpha L_{conf}(a,\hat{a})+\beta L_{loc}(f,\hat{f}))\quad(8) \] Through these methods and formulas, the paper proposes a more effective multi - task learning framework for simultaneously performing sensor - based human activity recognition and segmentation.