IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition

Zikang Leng,Amitrajit Bhattacharjee,Hrudhai Rajasekhar,Lizhe Zhang,Elizabeth Bruda,Hyeokhyen Kwon,Thomas Plötz
2024-02-02
Abstract:One of the primary challenges in the field of human activity recognition (HAR) is the lack of large labeled datasets. This hinders the development of robust and generalizable models. Recently, cross modality transfer approaches have been explored that can alleviate the problem of data scarcity. These approaches convert existing datasets from a source modality, such as video, to a target modality (IMU). With the emergence of generative AI models such as large language models (LLMs) and text-driven motion synthesis models, language has become a promising source data modality as well as shown in proof of concepts such as IMUGPT. In this work, we conduct a large-scale evaluation of language-based cross modality transfer to determine their effectiveness for HAR. Based on this study, we introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios: a motion filter capable of filtering out irrelevant motion sequences to ensure the relevance of the generated virtual IMU data, and a set of metrics that measure the diversity of the generated data facilitating the determination of when to stop generating virtual IMU data for both effective and efficient processing. We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%, which open up IMUGPT for practical use cases beyond a mere proof of concept.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **the lack of large - scale labeled datasets**, which is a long - standing challenge in the field of human activity recognition (HAR). Specifically, the paper focuses on the following aspects: 1. **The stopping time of data generation**: - The initial IMUGPT prototype was unable to determine clearly when the generation of virtual IMU data should be stopped, which led to problems in computational efficiency and cost. To optimize this process, the paper introduced **Diversity Metrics**. Through these metrics, the saturation point of data generation can be determined, that is, generating more data no longer provides meaningful information to the existing dataset. In this way, it can automatically identify when to stop data generation, saving time and computational resources. 2. **The relevance of generated data**: - The initial IMUGPT prototype could not ensure the relevance of the generated virtual IMU data to the target activity, which might lead to inaccurate generated data or even have a negative impact on downstream HAR applications. For this reason, the paper introduced a **Motion Filter** to identify and filter out motion sequences that do not accurately represent the specified activity. This helps to improve the performance of downstream classifiers and reduce the influence of noise. 3. **Specific methods of diversity metrics**: - The paper proposed two diversity metric methods for measuring the diversity of generated text descriptions and motion sequences respectively. Through these metrics, the diversity and relevance of the generated data can be evaluated, so as to better guide the data generation process. 4. **Specific implementation of the motion filter**: - The motion filter identifies and filters out motion sequences that do not accurately represent the specified activity through a pipeline. This helps to ensure the quality of the generated virtual IMU data and avoid introducing noise. Through these extensions and improvements, the paper aims to make the language - driven cross - modal transfer method more practical, thereby providing stronger support for actual HAR application scenarios. Specifically, these improvements can help researchers generate and utilize virtual IMU data more efficiently, improve the robustness and accuracy of HAR models, while saving time and computational resources.