SiamTST: A Novel Representation Learning Framework for Enhanced Multivariate Time Series Forecasting applied to Telco Networks

Simen Kristoffersen,Peter Skaar Nordby,Sara Malacarne,Massimiliano Ruocco,Pablo Ortiz
2024-07-02
Abstract:We introduce SiamTST, a novel representation learning framework for multivariate time series. SiamTST integrates a Siamese network with attention, channel-independent patching, and normalization techniques to achieve superior performance. Evaluated on a real-world industrial telecommunication dataset, SiamTST demonstrates significant improvements in forecasting accuracy over existing methods. Notably, a simple linear network also shows competitive performance, achieving the second-best results, just behind SiamTST. The code is available at <a class="link-external link-https" href="https://github.com/simenkristoff/SiamTST" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the challenges in Multivariate Time Series (MTS) analysis, particularly in the context of telecommunication networks. Specifically, the paper proposes a new framework named SiamTST to enhance the prediction performance of multivariate time series. #### Main Contributions Include: 1. **Novel Architecture**: Developed the SiamTST architecture that combines attention mechanisms and Siamese Networks to improve the representation learning capability of MTS. 2. **Implementation**: Released the PyTorch implementation of SiamTST and provided a detailed code repository. 3. **Comparative Analysis**: Extensively validated SiamTST using a large-scale telecommunication dataset, demonstrating significant improvements in prediction accuracy. 4. **Pre-training**: Investigated the impact of pre-training on model performance. ### Abstract and Background The paper introduces SiamTST, a new representation learning framework for multivariate time series. This framework combines Siamese Networks, attention mechanisms, channel independence blocking, and normalization techniques to achieve higher performance. Through evaluation on real industrial telecommunication datasets, SiamTST demonstrates significant improvements in prediction accuracy compared to existing methods. Notably, simple linear networks also show competitive performance, achieving the second-best results next to SiamTST. ### Methodology 1. **Problem Definition**: Given an MTS input X = [x₁, x₂, ..., xₗ] ∈ ℝᴺˣᴸ, where N represents the number of variables and L represents the number of time steps, the goal is to create a representation Z = [z₁, z₂, ..., zₗ] ∈ ℝᴰˣᴸ such that using Z can achieve better results on specific tasks. 2. **Architecture Overview**: - **Siamese Time Series Transformer (SiamTST)**: Inspired by PatchTST but modified to better suit pre-training. - **Channel Independence and Blocking**: Splits the MTS into N univariate time series and further divides them into multiple non-overlapping blocks. - **Backbone Structure**: The core is the Transformer encoder module, employing pre-normalization and RMSNorm instead of LayerNorm. ### Experimental Setup 1. **Dataset**: Data from Telenor Denmark, containing key performance indicators of base stations across Denmark. 2. **Evaluation Methods**: - **Linear Network Prediction**: Used as a baseline reference. - **Ridge Regression Prediction**: Predictions made using the same method. 3. **Experiments**: - **E1: Comparison with SOTA Methods**: Selected a set of SOTA models for comparison, such as TS2Vec, CoST, SimTS, and PatchTST. - **E2: Pre-training**: Pre-trained the backbone structure of SiamTST on different numbers of base stations, followed by fine-tuning and prediction. ### Results and Discussion 1. **Comparison with SOTA Methods**: Table 2 shows that SiamTST outperforms other benchmark methods across all prediction time spans, with the gap becoming more pronounced as the prediction time increases. 2. **Pre-training**: Table 3 shows the effect of pre-training SiamTST on different numbers of base stations, indicating that more base station data can significantly enhance model performance.