Cascaded Teaching Transformers with Data Reweighting for Long Sequence Time-series Forecasting

Haoyi Zhou,Chonghan Gao,Pengtao Xie,Jianxin Li
2023-01-01
Abstract:The Transformer-based models have shown superior performance in the long sequence time-series forecasting problem. The sparsity assumption on self-attention dot-product reveals that not all inputs are equally significant for Transformers. Instead of implicitly utilizing weighted time-series, we build a new learning framework by cascaded teaching Transformers to reweight samples. We formulate the framework as a multi-level optimization and design three different dataset-weight generators. We perform extensive experiments on five datasets, which shows that our proposed method could significantly outperform the SOTA Transformers.
What problem does this paper attempt to address?