AD-autoformer: decomposition transformers with attention distilling for long sequence time-series forecasting

Cao, Danyang
DOI: https://doi.org/10.1007/s11227-024-06266-8
IF: 3.3
2024-06-07
The Journal of Supercomputing
Abstract:The purpose of long-term forecasting is to meet the needs of practical applications, such as the prediction of the development trend of infectious diseases and the planning of electricity consumption. In this paper, we study the long-term forecasting of time series. Studies have shown that previous Transformer-based models have the potential to improve prediction capabilities, but there are also some problems, such as the lack of location information and the slow training speed. To solve these problems, we designed an efficient Transformer-based model called AD-Autoformer, which is specifically designed for long-term series prediction. By introducing position embedding, the model can better understand the patterns and relationships in the sequence, to improve the performance and generalization ability of the model when processing sequence data, and the self-attention distilling mechanism realizes the compression and acceleration of the model by halving the cascading layer input to highlight the dominant attention. This method significantly reduces the computational complexity of the model and improves the training speed of the model while maintaining the performance of the model. Experimental results on five large datasets show that the proposed AD-Autoformer model has different degrees of improvement in MSE and MAE indicators compared with other benchmark methods.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?