Abstract:Real-time monitoring and accurate prediction of key variables are indispensable to ensure industrial production activities proceed as expected. With the increase in measurement data volume and the improvement of hardware computing power, the Transformer and its variants, due to their excellent capability in extracting global dependencies, are playing an increasingly important role among deep learning-based multidimensional time series prediction models. In addition, from the perspective of causality, cause variables contain parts of information in effect variables and can reduce the uncertainty of effect variables, which is beneficial for prediction. However, there has been relatively limited research on combining the Transformer and causal feature analysis. To fully use both advantages, this paper introduces the Causal-Transformer (CT) model, which utilizes semi-orthogonal projection to extract causal features from multiple input variables. A multi-head spatial-temporal causal attention mechanism is designed in the encoder block based on the classical Transformer model to simultaneously reduce feature dimensions and extract implicit causal features in both the temporal and spatial dimensions. The CT also utilizes the Granger causality analysis to select the causal teaching indicators of target variables to provide stable assistance by injecting explicit causality into the inputs of the decoder block. By leveraging more condensed and independent causal features, the CT possesses inherent advantages in predicting time series variables. Case study results show that the CT model outperforms the other models on the diesel refinery dataset, especially with a reduction of 46.0% and 30.4% in MSE towards the classic Transformer and informer in five-step prediction. Copyright (C)2024 The Authors. This is an open access article under the CC BY-NC-ND license (htips://creativecommons.org/licenses/by-nc-nd/4.0/)

Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

How Transformers Learn Causal Structure with Gradient Descent

A Simple and Effective Positional Encoding for Transformers

Causal Interpretation of Self-Attention in Pre-Trained Transformers

A Meta-Learning Perspective on Transformers for Causal Language Modeling

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

Towards Understanding the Universality of Transformers for Next-Token Prediction

Breaking Symmetry When Training Transformers

Conditional Positional Encodings for Vision Transformers

Rethinking Position Embedding Methods in the Transformer Architecture

Improve Transformer Models with Better Relative Position Embeddings

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Learning positional encodings in transformers depends on initialization

A bio-inspired positional embedding network for transformer-based models

Positional Encodings for Light Curve Transformers: Playing with Positions and Attention

Transformers are Universal In-context Learners

Reach the Remote Neighbors: Dual-Encoding Transformer for Graphs

Causal-Transformer: Spatial-temporal Causal Attention-Based Transformer for Time Series Prediction

Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

Transformers with Sparse Attention for Granger Causality