SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

Zechen Li,Shohreh Deldari,Linyao Chen,Hao Xue,Flora D. Salim

2024-10-14

Abstract:In this work, we bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR). Despite the strong reasoning and generalization capabilities of LLMs, leveraging them for sensor data tasks remains largely unexplored. This gap stems from challenges like the lack of semantic context in time-series data, computational limitations, and LLMs' difficulty processing numerical inputs. To address these issues, we introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks. In the Sensor-Language Alignment Stage, we introduce special tokens for each sensor channel and automatically generate trend-descriptive text to align sensor data with textual inputs, enabling SensorLLM to capture numerical changes, channel-specific information, and sensor data of varying lengths-capabilities that existing LLMs typically struggle with, all without the need for human annotations. Next, in Task-Aware Tuning Stage, we refine the model for HAR classification using the frozen LLM and alignment module, achieving performance on par with or surpassing state-of-the-art models. We further demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through Sensor-Language Alignment, enabling it to generalize across diverse datasets for HAR tasks. We strongly believe our work lays the stepstone for future time-series and text alignment research, offering a path toward foundation models for sensor data.

Computation and Language

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to address how to combine large language models (LLMs) with wearable sensor technology to understand and process time series tasks such as human activity recognition (HAR). Although LLMs excel in reasoning and generalization capabilities, they still face many challenges in handling sensor data tasks, including: 1. **Lack of Semantic Context**: Time series data often lacks rich semantic information, making it difficult for LLMs to understand this data. 2. **Computational Limitations**: Pre-training or fine-tuning LLMs to directly handle time series data requires enormous computational resources. 3. **Difficulty in Handling Numerical Inputs**: LLMs have difficulty processing numerical inputs because their tokenizers are designed for text, not numerical data. To address these issues, the authors propose a two-stage framework called Sensor-LLM, as follows: 1. **Sensor-Language Alignment Stage**: By introducing special tokens specific to each sensor channel and automatically generating text that describes trends, sensor data is aligned with text input. This allows Sensor-LLM to capture numerical changes, channel-specific information, and sensor data of different lengths without manual annotation. 2. **Task-Aware Fine-Tuning Stage**: By freezing the LLM and alignment module, the model is fine-tuned for HAR classification, achieving or surpassing the performance of existing state-of-the-art models. Through these two stages, Sensor-LLM not only effectively handles sensor data but also demonstrates strong generalization capabilities across different HAR tasks. The authors believe that this work lays the foundation for future research on time series and text alignment and provides a pathway for foundational models of sensor data.

SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors

Language-centered Human Activity Recognition

Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research

Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges

Online Continual Learning for Human Activity Recognition

Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data

LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

Attention-based LSTM Network for Wearable Human Activity Recognition

Multidimensional Human Activity Recognition With Large Language Model: A Conceptual Framework

TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction

How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey

Large Language Models are Zero-Shot Recognizers for Activities of Daily Living

MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Integration of LLMs and the Physical World: Research and Application

Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors

Aligning Actions and Walking to LLM-Generated Textual Descriptions

Aligning Large Language Models with Human: A Survey

Large Language Model Alignment: A Survey