Abstract:Leveraging large language models (LLMs) has garnered increasing attention and introduced novel perspectives in time series classification. However, existing approaches often overlook the crucial dynamic temporal information inherent in time series data and face challenges in aligning this data with textual semantics. To address these limitations, we propose HiTime, a hierarchical multi-modal model that seamlessly integrates temporal information into LLMs for multivariate time series classification (MTSC). Our model employs a hierarchical feature encoder to capture diverse aspects of time series data through both data-specific and task-specific embeddings. To facilitate semantic space alignment between time series and text, we introduce a dual-view contrastive alignment module that bridges the gap between modalities. Additionally, we adopt a hybrid prompting strategy to fine-tune the pre-trained LLM in a parameter-efficient manner. By effectively incorporating dynamic temporal features and ensuring semantic alignment, HiTime enables LLMs to process continuous time series data and achieves state-of-the-art classification performance through text generation. Extensive experiments on benchmark datasets demonstrate that HiTime significantly enhances time series classification accuracy compared to most competitive baseline methods. Our findings highlight the potential of integrating temporal features into LLMs, paving the way for advanced time series analysis. The code is publicly available for further research and validation. Our codes are publicly available1.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main challenges faced by existing time - series classification methods when dealing with multivariate time - series data: 1. **Neglect of dynamic time information**: Existing time - series classification methods based on large language models (LLMs) often overlook the rich dynamic time information inherent in time - series data. These models usually rely on discrete text tokens and cannot fully capture the complex dynamic features in time - series. 2. **Difficulty in semantic alignment between modalities**: There are challenges in the semantic alignment between time - series data and text representations, which may cause the model to fail to fully capture the time - dependencies crucial for accurate classification. This misalignment will degrade the model performance as they do not fully utilize the dynamic features in time - series data. To solve these problems, the authors propose the HiTime model, which is a hierarchical multimodal model and improves time - series classification in the following ways: - **Hierarchical feature encoding**: A hierarchical feature encoder is adopted to extract multi - level feature representations from time - series data, including data - specific and task - specific embeddings. This ensures that the model can retain the key dynamic characteristics of the time - series and adapt to specific classification tasks. - **Dual - view contrastive alignment module**: A dual - view contrastive alignment module is introduced to bridge the semantic gap between time - series data and text information. By aligning time - series and text embeddings in the shared latent space, the understanding and generation ability of the model is improved. - **Mixed - prompt strategy**: A mixed - prompt strategy is used to perform parameter - efficient fine - tuning on the pre - trained LLM, enabling it to handle continuous time - series data and achieve accurate classification outputs through text generation. Through these innovations, HiTime not only effectively integrates dynamic time features but also ensures effective semantic alignment between time - series and text modalities, thereby significantly improving the accuracy of time - series classification. ### Formula summary - **Embedding concatenation for hierarchical feature encoding**: \[ Z=\text{Concat}[\text{Encoder}_d(X), \text{Encoder}_s(X)] \] where \(X\) is the input instance, \(\text{Encoder}_d(\cdot)\) and \(\text{Encoder}_s(\cdot)\) are the data - specific and task - specific encoders respectively, \(\text{Concat}(\cdot)\) is the concatenation operation, and \(Z\) is the encoder output after concatenation. - **Fine - grained alignment loss**: \[ L_{\text{fine}} = -\frac{1}{|D|}\left(\sum_{(e_c, e_t)\in D^+}\log\hat{y}+\sum_{(e_c, e_t)\in D^-}\log(1 - \hat{y})\right) \] where \(\hat{y}=F_c(e_c\oplus e_t)\), \(\oplus\) represents the concatenation operation, and \(F_c(\cdot)\) is a learnable mapping function that projects the concatenated vector into a 1x1 probability space. - **Coarse - grained alignment loss**: \[ L_{\text{coarse}}=-\frac{1}{|D|}\left(\sum_{(e_c, e_t)\in D^+}\log F(e_c, e_t)+\sum_{(e_c, e_t)\in D^-}\log\left(1 - F(e_c, e_t)\right)\right) \] where \(F(e_c, e_t)=\text{Sigmoid}(e_c e_t^T)\). - **Total loss function**: \[ L=\alpha L_{\text{coarse}}+\beta L_{\text{fine}} \] where \(\alpha\) and \(\beta\)

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Advancing Time Series Classification with Multimodal Language Modeling

DualTime: A Dual-Adapter Multimodal Language Model for Time Series Representation

Revisited Large Language Model for Time Series Analysis through Modality Alignment

TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models

CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis

A Deep Multi-Task Representation Learning Method for Time Series Classification and Retrieval.

LLM-TS Integrator: Integrating LLM for Enhanced Time Series Modeling

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Text-centric Alignment for Multi-Modality Learning

Taming Pre-trained LLMs for Generalised Time Series Forecasting via Cross-modal Knowledge Distillation

FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification

LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters

Towards Time Series Reasoning with LLMs

An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting

Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

TV-Net: Temporal-Variable feature harmonizing Network for multivariate time series classification and interpretation