Abstract:In the human activity recognition research area, prior studies predominantly concentrate on leveraging advanced algorithms on public datasets to enhance recognition performance, little attention has been paid to executing real-time kitchen activity recognition on energy-efficient, cost-effective edge devices. Besides, the prevalent approach of segregating data collection and context extraction across different devices escalates power usage, latency, and user privacy risks, impeding widespread adoption. This work presents a multi-modal wearable edge computing system for human activity recognition in real-time. Integrating six different sensors, ranging from inertial measurement units (IMUs) to thermal cameras, and two different microcontrollers, this system achieves end-to-end activity recognition, from data capture to context extraction, locally. Evaluation in an unmodified realistic kitchen validates its efficacy in recognizing fifteen activities, including a null class. Employing a compact machine learning model (184.5 kbytes) yields an average accuracy of 87.83 \%, with model inference completed in 25.26 ms on the microcontroller. Comparative analysis with alternative microcontrollers showcases power consumption and inference speed performance, demonstrating the proposed system's viability.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Real - time activity recognition in the kitchen environment, especially performing this task on low - power, low - cost edge devices**. Specifically, existing research mainly focuses on using advanced algorithms and public datasets to improve activity recognition performance, but rarely pays attention to real - time kitchen activity recognition on energy - efficient and low - cost edge devices. In addition, existing methods usually separate data collection and context extraction onto different devices, which not only increases power consumption and latency, but also brings user privacy risks and hinders wide - scale applications.
To solve these problems, the authors propose a **multi - modal wearable edge - computing system** for real - time kitchen activity recognition. This system achieves end - to - end local processing from data collection to context extraction by integrating six different sensors (including inertial measurement units (IMU), thermal imaging cameras, etc.) and two different microcontrollers. Here are several key points of this system:
1. **Hardware Design**:
- The system contains six sensors and two microcontrollers (MCUs), which are respectively responsible for data collection, storage, and real - time context extraction.
- The sensors include optical sensors, gas sensors, infrared arrays, barometric pressure sensors, IMU (inertial measurement units) and ToF (Time - of - Flight) ranging sensors.
- The MCU modules selected are nRF52840 and MIMXRT1062, which are respectively suitable for low - power consumption and high - performance requirements.
2. **Neural Network Architecture**:
- Two popular neural network models are used: multi - channel time - series convolutional neural network (MC - CNN) and DeepConvLSTM.
- In order to adapt to the resource limitations of the MCU, a data fusion method is adopted. The data of all channels are spliced and then input into the neural network, and one CNN layer is reduced to reduce the model size.
3. **Experimental Evaluation**:
- Verification was carried out in an unmodified real - life kitchen environment. The system can recognize 15 activities, with an average accuracy rate of 87.83% and an inference time of 25.26 milliseconds.
- The performance on different MCUs was compared, showing the energy - efficiency and inference - speed advantages of the proposed system.
In conclusion, this paper aims to overcome the power - consumption, latency, and privacy problems in existing methods by developing a multi - modal wearable edge - computing system to achieve efficient real - time kitchen activity recognition on low - power, low - cost edge devices.