WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition

Marius Bock,Hilde Kuehne,Kristof Van Laerhoven,Michael Moeller
2024-10-15
Abstract:Research has shown the complementarity of camera- and inertial-based data for modeling human activities, yet datasets with both egocentric video and inertial-based sensor data remain scarce. In this paper, we introduce WEAR, an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). Data from 22 participants performing a total of 18 different workout activities was collected with synchronized inertial (acceleration) and camera (egocentric video) data recorded at 11 different outside locations. WEAR provides a challenging prediction scenario in changing outdoor environments using a sensor placement, in line with recent trends in real-world applications. Benchmark results show that through our sensor placement, each modality interestingly offers complementary strengths and weaknesses in their prediction performance. Further, in light of the recent success of single-stage Temporal Action Localization (TAL) models, we demonstrate their versatility of not only being trained using visual data, but also using raw inertial data and being capable to fuse both modalities by means of simple concatenation. The dataset and code to reproduce experiments is publicly available via: <a class="link-external link-http" href="http://mariusbock.github.io/wear/" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively combine visual and inertial sensor data for human activity recognition (HAR) in outdoor sports scenarios. Specifically, the researchers have noticed that although there are already many studies on human activity recognition, most of the existing benchmark datasets either only provide camera - based data or only provide inertial - sensor - based data, and datasets with both types of data are relatively scarce. Moreover, most of these datasets are collected in indoor environments and lack applications in changing outdoor environments. To fill this gap, the researchers introduced the WEAR dataset, which is a human activity recognition dataset specifically designed for outdoor sports. It contains synchronous inertial (acceleration) and camera (first - person - view video) data of 22 participants performing 18 different sports activities at 11 different outdoor locations. Through this dataset, the researchers hope to evaluate how to utilize the advantages of the two - modality data in changing outdoor environments and explore the performance of the latest single - stage temporal action localization (TAL) models in processing raw inertial data and fusing multi - modality information. The design of the WEAR dataset takes into account the sensor arrangement in practical applications, matching the popular head - mounted cameras and wrist - worn smartwatches on the current market, aiming to provide a test platform that is closer to real - world application scenarios. This not only helps to improve the accuracy of activity recognition but also promotes the practical applications of related technologies in fields such as fitness tracking and medical support.