Enhancing multivariate time-series anomaly detection with positional encoding mechanisms in transformers
Abdul Amir Alioghli,Feyza Yıldırım Okay
DOI: https://doi.org/10.1007/s11227-024-06694-6
IF: 3.3
2024-12-13
The Journal of Supercomputing
Abstract:The surge in automation driven by IoT devices has generated extensive time-series data with highly variable features, posing challenges in anomaly detection. DL, particularly Transformer networks, has shown promise in addressing these issues. However, Transformer networks struggle with accurately determining the position of data points and maintaining the order of data in sequences, leading to the development of Positional Encoding (PE). Initially, Absolute PE was introduced, but newer methods like Relative PE and Rotary PE have been adopted in natural language processing tasks to improve performance. This study evaluates the potential of PEs including Absolute PE, Rotary PE, and two modifications of Relative PE methods (Representative attention and Global attention), for multivariate time-series anomaly detection problems. The experimental results indicate that Absolute PE, with a 98% accuracy score, performs well across different window sizes. Representative attention, with a 98% F1-score, performs best for short sequences (8, 16, and 32); whereas, Global attention, with a 97% F1-score, is more effective for longer sequences (64 and 128). Additionally, Absolute PE has the shortest training times, starting at 25 for sequence length 8 and increasing to 192 for length 128. Rotary PE also has slightly longer training times compared to Absolute PE. On the other hand, Representative attention consistently has the longest times, starting at 48 for length 8 and reaching 366 for length 128. Overall, Absolute PE and Global attention are the most time-efficient; while, Representative attention has significantly higher training times, particularly for long sequences.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture