Make Active Attention More Active: Using Lipschitz Regularity to Improve Long Sequence Time-Series Forecasting.

Xiangxu Meng,Wei Li,Wenqi Zheng,Zheng Zhao,Guangsheng Feng,Huiqiang Wang
DOI: https://doi.org/10.1007/978-981-99-4742-3_13
2023-01-01
Abstract:Long-term time series prediction aims to accurately forecast the future by analyzing the prevailing temporal patterns derived from historical data inputs. Commonly used techniques for long-term time series prediction using an encoder-decoder architecture incorporating a self-attention-based mechanism have achieved impressive results and remain at the forefront of the field. Furthermore, to address the quadratic time complexity associated with the self-attention mechanism, researchers have sought to exploit the long-tailed distribution of self-attention and use several approaches based on the selection idea to achieve better performance while improving model efficiency. However, these efforts have primarily focused on practical implementations rather than exploring the underlying theoretical principles and reinforcing their effectiveness. Inspired by the increasing disparity between graph nodes in graph neural networks (GNNs), we investigate the impact of the long-tailed distribution of self-attention scores on prediction accuracy in this work. We propose a novel approach to enhance the distinction of self-attention scores and achieve performance improvements. Moreover, we incorporate this approach into a state-of-the-art model and validate its effectiveness through theoretical analysis and visual verification. Our method was extensively tested on four large datasets and ultimately showed superiority over existing methods. In addition to a remarkable average reduction of 18% in MSE and 12% in MAE, our approach also reduced time consumption by 38%.
What problem does this paper attempt to address?