DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time Series
Rundong Zuo,Guozhong Li,Rui Cao,Byron Choi,Jianliang Xu,Sourav S Bhowmick
DOI: https://doi.org/10.14778/3681954.3681996
IF: 2.5
2024-07-01
Proceedings of the VLDB Endowment
Abstract:Transformer-based models have facilitated numerous applications with superior performance. A key challenge in transformers is the quadratic dependency of its training time complexity on the length of the input sequence. A recent popular solution is using random feature attention (RFA) to approximate the costly vanilla attention mechanism. However, RFA relies on only a single, fixed projection for approximation, which does not capture the input distribution and can lead to low efficiency and accuracy, especially on time series data. In this paper, we propose DARKER, an efficient transformer with a novel DA ta-d R iven KER nel-based attention mechanism. To precisely present the technical details, this paper discusses them with a fundamental time series task, namely, time series classification (tsc). First, the main novelty of DARKER lies in approximating the softmax kernel by learning multiple machine learning models with trainable weights as multiple projections offline, moving beyond the limitation of a fixed projection. Second, we propose a projection index (called pIndex) to efficiently search the most suitable projection for the input for training transformer. As a result, the overall time complexity of DARKER is linear with the input length. Third, we propose an indexing technique for efficiently computing the inputs required for transformer training. Finally, we evaluate our method on 14 real-world and 2 synthetic time series datasets. The experiments show that DARKER is 3×-4× faster than vanilla transformer and 1.5×-3× faster than other SOTAs for long sequences. In addition, the accuracy of DARKER is comparable to or higher than that of all compared transformers.
computer science, information systems, theory & methods