LETS-C: Leveraging Language Embedding for Time Series Classification

Rachneet Kaur,Zhen Zeng,Tucker Balch,Manuela Veloso
2024-07-09
Abstract:Recent advancements in language modeling have shown promising results when applied to time series data. In particular, fine-tuning pre-trained large language models (LLMs) for time series classification tasks has achieved state-of-the-art (SOTA) performance on standard benchmarks. However, these LLM-based models have a significant drawback due to the large model size, with the number of trainable parameters in the millions. In this paper, we propose an alternative approach to leveraging the success of language modeling in the time series domain. Instead of fine-tuning LLMs, we utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP). We conducted extensive experiments on well-established time series classification benchmark datasets. We demonstrated LETS-C not only outperforms the current SOTA in classification accuracy but also offers a lightweight solution, using only 14.5% of the trainable parameters on average compared to the SOTA model. Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification while maintaining a lightweight model architecture.
Machine Learning,Artificial Intelligence,Computational Engineering, Finance, and Science,Computation and Language,Methodology
What problem does this paper attempt to address?
This paper mainly discusses how to use Language Embeddings for Time Series Classification (LETS-C). In the study, the authors propose a new method that does not rely on fine-tuning large-scale language models (LLMs), but instead uses a language embedding model to transform time series data into vectors, and then combines a simple Convolutional Neural Network (CNN) and a Multilayer Perceptron (MLP) classification head for classification. The experiments show that this LETS-C method not only outperforms the current state-of-the-art methods in terms of classification accuracy on standard benchmark datasets, but also reduces the average number of required training parameters by about 14.5%, thus providing a more lightweight model architecture. The paper points out that although pre-trained LLMs perform well on time series tasks, their large model size and high computational cost limit their application in resource-constrained environments. LETS-C addresses this issue by converting time series into language embeddings, capturing complex patterns and dependencies in the time series, and then using CNN and MLP for classification. Extensive comparisons on 10 different domain datasets demonstrate that LETS-C achieves higher classification accuracy while maintaining efficiency. In addition, the paper conducts various analyses, including the effectiveness of different text embedding models, the advantages of time series embeddings, and the trade-off between model size and accuracy. LETS-C demonstrates the potential to achieve high performance and lightweight model architecture in time series classification tasks while significantly reducing the number of model parameters.