Abstract:Sensor-based human activity segmentation and recognition are two important and challenging problems in many real-world applications and they have drawn increasing attention from the deep learning community in recent years. Most of the existing deep learning works were designed based on pre-segmented sensor streams and they have treated activity segmentation and recognition as two separate tasks. In practice, performing data stream segmentation is very challenging. We believe that both activity segmentation and recognition may convey unique information which can complement each other to improve the performance of the two tasks. In this paper, we firstly proposes a new multitask deep neural network to solve the two tasks simultaneously. The proposed neural network adopts selective convolution and features multiscale windows to segment activities of long or short time durations. First, multiple windows of different scales are generated to center on each unit of the feature sequence. Then, the model is trained to predict, for each window, the activity class and the offset to the true activity boundaries. Finally, overlapping windows are filtered out by non-maximum suppression, and adjacent windows of the same activity are concatenated to complete the segmentation task. Extensive experiments were conducted on eight popular benchmarking datasets, and the results show that our proposed method outperforms the state-of-the-art methods both for activity recognition and segmentation.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two important and challenging issues: sensor - based human Activity Recognition (AR) and Activity Segmentation (AS). Specifically: 1. **Limitations of existing methods**: - Most of the existing deep - learning works are designed based on pre - segmented sensor data streams, regarding activity segmentation and recognition as two independent tasks. - In practical applications, it is very challenging to perform data stream segmentation, especially in real - time applications. - If data segmentation is used as a pre - processing step, segmentation errors may propagate to subsequent steps, affecting the overall performance. 2. **Proposed solutions to new problems**: - The author believes that activity segmentation and recognition can complement each other and convey unique information, thereby improving the performance of both tasks. - The paper proposes a new multi - task deep neural network framework (MTHARS), aiming to solve the activity segmentation and recognition problems simultaneously. - This framework adopts the methods of selective convolution and multi - scale windows to handle activities of different time lengths. 3. **Specific objectives**: - **Effectively combine activity segmentation and recognition**: Through the multi - task learning framework, perform activity segmentation and recognition simultaneously to improve the overall performance. - **Handle dynamic activity lengths**: Propose a multi - scale window splicing method to adapt to activities of different lengths. - **Verify model performance**: Experiments were carried out on eight popular data sets, and the results show that MTHARS is superior to existing methods, and the influence of key factors was analyzed through ablation studies. ### Summary of mathematical formulas - **Offset calculation**: \[ f_x=\frac{t_x - w_x}{w_l}\quad(1) \] \[ f_l = \log\left(\frac{t_l}{w_l}\right)\quad(2) \] - **Predict activity boundaries**: \[ \hat{t}_x=f_xw_l + w_x\quad(3) \] \[ \hat{t}_l=w_l\exp(f_l)\quad(4) \] - **Loss function**: \[ L_{loc}(f,\hat{f})=\sum_{i\in\{x, l\}}\text{Smooth L1}(f_i-\hat{f}_i)\quad(5) \] where, \[ \text{Smooth L1}(x)= \begin{cases} 0.5(x)^2&\text{if }|x|<1\\ |x|- 0.5&\text{otherwise} \end{cases}\quad(6) \] \[ L_{conf}(a,\hat{a})=-\sum_{i = 1}^{n}a_i\log(\hat{a}_i)\quad(7) \] \[ L(a,\hat{a},f,\hat{f})=\frac{1}{N}(\alpha L_{conf}(a,\hat{a})+\beta L_{loc}(f,\hat{f}))\quad(8) \] Through these methods and formulas, the paper proposes a more effective multi - task learning framework for simultaneously performing sensor - based human activity recognition and segmentation.

A Multi-Task Deep Learning Approach for Sensor-based Human Activity Recognition and Segmentation

A Multitask Deep Learning Approach for Sensor-Based Human Activity Recognition and Segmentation

A Boundary Consistency-aware Multi-task Learning Framework for Joint Activity Segmentation and Recognition with Wearable Sensors

Deep Dilation on Multimodality Time Series for Human Activity Recognition.

Hierarchical Multi-View Aggregation Network for Sensor-Based Human Activity Recognition.

A Deep Learning-Based Semantic Segmentation Model Using MCNN and Attention Layer for Human Activity Recognition

Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

Deep Learning for Sensor-based Activity Recognition: A Survey

Understanding and Improving Deep Neural Network for Activity Recognition

Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition

A Multi-dimensional Parallel Convolutional Connected Network Based on Multi-source and Multi-modal Sensor Data for Human Activity Recognition

A Hybrid Attention-Based Deep Neural Network for Simultaneous Multi-Sensor Pruning and Human Activity Recognition

Sensor-based Human Activity Recognition Using Graph LSTM and Multi-task Classification Model

A Multidimensional Parallel Convolutional Connected Network Based on Multisource and Multimodal Sensor Data for Human Activity Recognition

Cross-Attention Enhanced Pyramid Multi-Scale Networks for Sensor-based Human Activity Recognition

Multi-Channel Deep Networks on Sequence Data for Multi-Action Recognition

MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition

A Deep Dilated Convolutional Self-attention Model for Multimodal Human Activity Recognition

Multihead-Res-SE Residual Network with Attention for Human Activity Recognition

AttnSense: Multi-level Attention Mechanism for Multimodal Human Activity Recognition

A Human Action Recognition Model Inspired By Multiple Scale Temporal Segments Model Fusion