Enhancing Masked Time-Series Modeling via Dropping Patches

Tianyu Qiu,Yi Xie,Yun Xiong,Hao Niu,Xiaofeng Gao
2024-12-20
Abstract:This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start. This paper conducts comprehensive experiments to verify the effectiveness of the method and analyze its internal mechanism. Empirically, DropPatch strengthens the attention mechanism, reduces information redundancy and serves as an efficient means of data augmentation. Theoretically, it is proved that DropPatch slows down the rate at which the Transformer representations collapse into the rank-1 linear subspace by randomly dropping patches, thus optimizing the quality of the learned representations
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Existing mask - based time - series modeling methods (such as PatchTST) have limitations in learning useful features. The specific manifestations are as follows: 1. **Low mask ratio leads to shallow learning**: A lower mask ratio cannot effectively learn the deep - level features in the time series, causing the model to only recover the surface patterns, thus resulting in the over - fitting problem. 2. **High mask ratio leads to distracted attention**: A higher mask ratio will dilute the attention mechanism, making it difficult to focus on the relevant and important parts of the data, thereby reducing the performance of downstream tasks. To solve these problems, the author proposes a simple and effective strategy - DropPatch. This strategy enhances the effect of existing masked time - series modeling in the pre - training stage by randomly discarding subsequence - level fragments (patches) in the time series. ### Main advantages of DropPatch: - **Improve pre - training efficiency**: By reducing the number of patches to be processed, it significantly improves computational efficiency and reduces memory consumption. - **Enhance the attention mechanism**: It enables the attention mechanism to focus more on multi - scale and diverse information, thereby capturing more critical patterns. - **Reduce information redundancy**: By randomly discarding patches, it reduces the redundancy in the representation and optimizes the quality of the learned representation. ### Theoretical and empirical support: - **Theoretical analysis**: It is proved that DropPatch can slow down the speed at which the Transformer representation converges to the rank - 1 linear subspace by randomly discarding patches, thereby promoting feature diversity. - **Experimental evidence**: The effectiveness of DropPatch in different scenarios, including in - domain, cross - domain, few - sample learning, and cold - start tasks, has been verified through a large number of experiments. ### Summary: The paper proposes a new pre - training strategy, DropPatch, which aims to overcome the limitations of existing mask - based time - series modeling methods by randomly discarding subsequence fragments in the time series. This method not only improves pre - training efficiency but also shows a significant performance improvement in multiple downstream tasks.