Abstract:Real-time monitoring and accurate prediction of key variables are indispensable to ensure industrial production activities proceed as expected. With the increase in measurement data volume and the improvement of hardware computing power, the Transformer and its variants, due to their excellent capability in extracting global dependencies, are playing an increasingly important role among deep learning-based multidimensional time series prediction models. In addition, from the perspective of causality, cause variables contain parts of information in effect variables and can reduce the uncertainty of effect variables, which is beneficial for prediction. However, there has been relatively limited research on combining the Transformer and causal feature analysis. To fully use both advantages, this paper introduces the Causal-Transformer (CT) model, which utilizes semi-orthogonal projection to extract causal features from multiple input variables. A multi-head spatial-temporal causal attention mechanism is designed in the encoder block based on the classical Transformer model to simultaneously reduce feature dimensions and extract implicit causal features in both the temporal and spatial dimensions. The CT also utilizes the Granger causality analysis to select the causal teaching indicators of target variables to provide stable assistance by injecting explicit causality into the inputs of the decoder block. By leveraging more condensed and independent causal features, the CT possesses inherent advantages in predicting time series variables. Case study results show that the CT model outperforms the other models on the diesel refinery dataset, especially with a reduction of 46.0% and 30.4% in MSE towards the classic Transformer and informer in five-step prediction. Copyright (C)2024 The Authors. This is an open access article under the CC BY-NC-ND license (htips://creativecommons.org/licenses/by-nc-nd/4.0/)

CAT: Causal Audio Transformer for Audio Classification

Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

CAT: Cross Attention in Vision Transformer

CT-SAT: Contextual Transformer for Sequential Audio Tagging

Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

VSET: A MULTIMODAL TRANSFORMER FOR VISUAL SPEECH ENHANCEMENT

MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms

Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks

EAViT: External Attention Vision Transformer for Audio Classification

Context-Aware Transformer for image captioning

PointCAT: Cross-Attention Transformer for point cloud

An Improved Audio Classification Method Based on Parameter-Free Attention Combined with Self-Supervision

Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement

MCT-VHD: Multi-modal contrastive transformer for video highlight detection

Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer

Causal-Transformer: Spatial-temporal Causal Attention-Based Transformer for Time Series Prediction

On the Power of Convolution Augmented Transformer