SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Jiaxiang Dong,Haixu Wu,Haoran Zhang,Li Zhang,Jianmin Wang,Mingsheng Long
2023-10-23
Abstract:Time series analysis is widely used in extensive areas. Recently, to reduce labeling expenses and benefit various tasks, self-supervised pre-training has attracted immense interest. One mainstream paradigm is masked modeling, which successfully pre-trains deep models by learning to reconstruct the masked content based on the unmasked part. However, since the semantic information of time series is mainly contained in temporal variations, the standard way of randomly masking a portion of time points will seriously ruin vital temporal variations of time series, making the reconstruction task too difficult to guide representation learning. We thus present SimMTM, a Simple pre-training framework for Masked Time-series Modeling. By relating masked modeling to manifold learning, SimMTM proposes to recover masked time points by the weighted aggregation of multiple neighbors outside the manifold, which eases the reconstruction task by assembling ruined but complementary temporal variations from multiple masked series. SimMTM further learns to uncover the local structure of the manifold, which is helpful for masked modeling. Experimentally, SimMTM achieves state-of-the-art fine-tuning performance compared to the most advanced time series pre-training methods in two canonical time series analysis tasks: forecasting and classification, covering both in- and cross-domain settings.
Machine Learning
What problem does this paper attempt to address?
The main aim of this paper is to address the issue of self-supervised pre-training methods in time series analysis, specifically by proposing a new pre-training framework tailored to the characteristics of time series data—SimMTM (Simple Masked Time-series Modeling). The core problem of the paper is how to effectively utilize unlabeled time series data for pre-training to improve the performance of downstream tasks (such as prediction and classification). ### Main Problems the Paper Attempts to Solve 1. **Effectively Utilizing the Features of Time Series**: Unlike text and images, the key information in time series often lies in its temporal changes, such as trends, periodicity, and peaks. Therefore, directly adopting random masking methods can disrupt these important temporal changes, making the task of reconstructing the masked content from the residual parts very difficult. 2. **Proposing a New Pre-training Task**: To overcome the above issues, the paper proposes the SimMTM framework, which achieves pre-training by recovering the original time series from multiple randomly masked time series. This method not only better preserves the key features of the time series but also promotes the model's learning of the local structure of the time series manifold. 3. **Improving the Performance of Time Series Analysis Tasks**: Through the above design, SimMTM aims to enhance the performance of various time series analysis tasks, including low-level prediction tasks and high-level classification tasks, and achieve excellent performance in both in-domain and cross-domain settings. ### Technical Contributions - **New Masked Modeling Task**: SimMTM proposes a new task, which is to recover the original time series from multiple randomly masked time series, helping to preserve important dynamic features in the time series. - **Simple and Effective Pre-training Framework**: SimMTM performs reconstruction by aggregating point-level representations, which are based on similarities learned in the sequence-level representation space. Additionally, SimMTM introduces a constraint loss to guide sequence-level representation learning. - **Experimental Validation**: SimMTM achieves state-of-the-art fine-tuning performance on typical time series analysis tasks, including prediction and classification tasks, even in the presence of significant domain gaps. In summary, this paper addresses the issues of traditional masked modeling methods in handling time series data by proposing the new pre-training framework SimMTM and demonstrates its effectiveness in various time series analysis tasks.