SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

Zechen Li,Shohreh Deldari,Linyao Chen,Hao Xue,Flora D. Salim
2024-10-14
Abstract:In this work, we bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR). Despite the strong reasoning and generalization capabilities of LLMs, leveraging them for sensor data tasks remains largely unexplored. This gap stems from challenges like the lack of semantic context in time-series data, computational limitations, and LLMs' difficulty processing numerical inputs. To address these issues, we introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks. In the Sensor-Language Alignment Stage, we introduce special tokens for each sensor channel and automatically generate trend-descriptive text to align sensor data with textual inputs, enabling SensorLLM to capture numerical changes, channel-specific information, and sensor data of varying lengths-capabilities that existing LLMs typically struggle with, all without the need for human annotations. Next, in Task-Aware Tuning Stage, we refine the model for HAR classification using the frozen LLM and alignment module, achieving performance on par with or surpassing state-of-the-art models. We further demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through Sensor-Language Alignment, enabling it to generalize across diverse datasets for HAR tasks. We strongly believe our work lays the stepstone for future time-series and text alignment research, offering a path toward foundation models for sensor data.
Computation and Language
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address how to combine large language models (LLMs) with wearable sensor technology to understand and process time series tasks such as human activity recognition (HAR). Although LLMs excel in reasoning and generalization capabilities, they still face many challenges in handling sensor data tasks, including: 1. **Lack of Semantic Context**: Time series data often lacks rich semantic information, making it difficult for LLMs to understand this data. 2. **Computational Limitations**: Pre-training or fine-tuning LLMs to directly handle time series data requires enormous computational resources. 3. **Difficulty in Handling Numerical Inputs**: LLMs have difficulty processing numerical inputs because their tokenizers are designed for text, not numerical data. To address these issues, the authors propose a two-stage framework called Sensor-LLM, as follows: 1. **Sensor-Language Alignment Stage**: By introducing special tokens specific to each sensor channel and automatically generating text that describes trends, sensor data is aligned with text input. This allows Sensor-LLM to capture numerical changes, channel-specific information, and sensor data of different lengths without manual annotation. 2. **Task-Aware Fine-Tuning Stage**: By freezing the LLM and alignment module, the model is fine-tuned for HAR classification, achieving or surpassing the performance of existing state-of-the-art models. Through these two stages, Sensor-LLM not only effectively handles sensor data but also demonstrates strong generalization capabilities across different HAR tasks. The authors believe that this work lays the foundation for future research on time series and text alignment and provides a pathway for foundational models of sensor data.