Abstract:BACKGROUND: A daily activity routine is vital for overall health and well-being, supporting physical and mental fitness. Consistent physical activity is linked to a multitude of benefits for the body, mind, and emotions, playing a key role in raising a healthy lifestyle. The use of wearable devices has become essential in the realm of health and fitness, facilitating the monitoring of daily activities. While convolutional neural networks (CNN) have proven effective, challenges remain in quickly adapting to a variety of activities. OBJECTIVE: This study aimed to develop a model for precise recognition of human activities to revolutionize health monitoring by integrating transformer models with multi-head attention for precise human activity recognition using wearable devices. METHODS: The Human Activity Recognition (HAR) algorithm uses deep learning to classify human activities using spectrogram data. It uses a pretrained convolution neural network (CNN) with a MobileNetV2 model to extract features, a dense residual transformer network (DRTN), and a multi-head multi-level attention architecture (MH-MLA) to capture time-related patterns. The model then blends information from both layers through an adaptive attention mechanism and uses a SoftMax function to provide classification probabilities for various human activities. RESULTS: The integrated approach, combining pretrained CNN with transformer models to create a thorough and effective system for recognizing human activities from spectrogram data, outperformed these methods in various datasets – HARTH, KU-HAR, and HuGaDB produced accuracies of 92.81%, 97.98%, and 95.32%, respectively. This suggests that the integration of diverse methodologies yields good results in capturing nuanced human activities across different activities. The comparison analysis showed that the integrated system consistently performs better for dynamic human activity recognition datasets. CONCLUSION: In conclusion, maintaining a routine of daily activities is crucial for overall health and well-being. Regular physical activity contributes substantially to a healthy lifestyle, benefiting both the body and the mind. The integration of wearable devices has simplified the monitoring of daily routines. This research introduces an innovative approach to human activity recognition, combining the CNN model with a dense residual transformer network (DRTN) with multi-head multi-level attention (MH-MLA) within the transformer architecture to enhance its capability.

Multimodal Transformer for Nursing Activity Recognition

Mmformer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation

Revolutionizing health monitoring: Integrating transformer models with multi-head attention for precise human activity recognition using wearable devices

A Skeleton-based Action Recognition System for Medical Condition Detection

MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition

Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition

Multimodal Neurophysiological Transformer for Emotion Recognition

Hierarchical Multi-View Aggregation Network for Sensor-Based Human Activity Recognition.

Cmf-transformer: cross-modal fusion transformer for human action recognition

Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

A Hybrid Deep Learning Model for Human Activity Recognition Using Multimodal Body Sensing Data

Multi-scale Context-aware Network with Transformer for Gait Recognition

A Multidimensional Parallel Convolutional Connected Network Based on Multisource and Multimodal Sensor Data for Human Activity Recognition

A Multi-dimensional Parallel Convolutional Connected Network Based on Multi-source and Multi-modal Sensor Data for Human Activity Recognition

Adaptive Multimodal Fusion Framework for Activity Monitoring of People with Mobility Disability

Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

Multi-Modal Transformer with Skeleton and Text for Action Recognition

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Convolutional Neural Network with Multi-Head Attention for Human Activity Recognition

MMTFN: Multi‐modal multi‐scale transformer fusion network for Alzheimer's disease diagnosis