Abstract:Real-time monitoring and accurate prediction of key variables are indispensable to ensure industrial production activities proceed as expected. With the increase in measurement data volume and the improvement of hardware computing power, the Transformer and its variants, due to their excellent capability in extracting global dependencies, are playing an increasingly important role among deep learning-based multidimensional time series prediction models. In addition, from the perspective of causality, cause variables contain parts of information in effect variables and can reduce the uncertainty of effect variables, which is beneficial for prediction. However, there has been relatively limited research on combining the Transformer and causal feature analysis. To fully use both advantages, this paper introduces the Causal-Transformer (CT) model, which utilizes semi-orthogonal projection to extract causal features from multiple input variables. A multi-head spatial-temporal causal attention mechanism is designed in the encoder block based on the classical Transformer model to simultaneously reduce feature dimensions and extract implicit causal features in both the temporal and spatial dimensions. The CT also utilizes the Granger causality analysis to select the causal teaching indicators of target variables to provide stable assistance by injecting explicit causality into the inputs of the decoder block. By leveraging more condensed and independent causal features, the CT possesses inherent advantages in predicting time series variables. Case study results show that the CT model outperforms the other models on the diesel refinery dataset, especially with a reduction of 46.0% and 30.4% in MSE towards the classic Transformer and informer in five-step prediction. Copyright (C)2024 The Authors. This is an open access article under the CC BY-NC-ND license (htips://creativecommons.org/licenses/by-nc-nd/4.0/)

A Meta-Learning Perspective on Transformers for Causal Language Modeling

How Transformers Learn Causal Structure with Gradient Descent

Transformer-based Causal Language Models Perform Clustering

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models

Causal Interpretation of Self-Attention in Pre-Trained Transformers

Teaching Transformers Causal Reasoning through Axiomatic Training

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

Causal-Transformer: Spatial-temporal Causal Attention-Based Transformer for Time Series Prediction

Transformers are Universal In-context Learners

Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles

Causal Deep Learning: Causal Capsules and Tensor Transformers

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Analyzing Transformer Dynamics as Movement through Embedding Space

Towards Understanding the Universality of Transformers for Next-Token Prediction

Does learning the right latent variables necessarily improve in-context learning?