Abstract:Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) often struggle with capturing the temporal dynamics of vibration signals. To overcome this, the application of Transformer-based Vision Transformer (ViT) methods to fault diagnosis is gaining attraction. Nonetheless, these methods typically require extensive preprocessing, which increases computational complexity, potentially reducing the efficiency of the diagnosis process. Addressing this gap, this paper presents the Time Series Vision Transformer (TSViT), tailored for effective fault diagnosis. TSViT incorporates a convolutional layer to extract local features from vibration signals, alongside a transformer encoder to discern long-term temporal patterns. A thorough experimental comparison on three diverse datasets demonstrates TSViT's effectiveness and adaptability. Moreover, the paper delves into the influence of hyperparameter tuning on the model's performance, computational demand, and parameter count. Remarkably, TSViT achieves an unprecedented 100% average accuracy on two test sets and 99.99% on another, showcasing its exceptional diagnostic capabilities.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that traditional fault diagnosis methods have difficulty in capturing the time - dynamic characteristics of signals when dealing with vibration signals of rotating machinery. Specifically, although convolutional neural networks (CNNs) perform well in extracting local features, their convolutional filters limit their ability to capture global information and cannot effectively capture long - time - series dependencies. While the Vision Transformer (ViT) method based on Transformer has some improvements in capturing long - time - series dependencies, it usually requires a large number of pre - processing steps, which increases the computational complexity and reduces the efficiency of the diagnosis process. To solve the above problems, this paper proposes the Time - Series Vision Transformer (TSViT) to improve the effectiveness and adaptability of fault diagnosis. TSViT realizes comprehensive spatial and temporal feature extraction by combining convolutional layers to extract local features and Transformer encoders to identify long - time - series time patterns. Experimental results show that TSViT performs well on three different datasets, with an average accuracy rate reaching an unprecedented 100%, demonstrating its excellent diagnostic ability. ### Main contributions: 1. **Propose the TSViT model**: A Time - Series Vision Transformer model for fault diagnosis that can directly process raw time - series signals. 2. **Develop the time - series patch embedding method**: Enable TSViT to accept one - dimensional or multi - dimensional time - domain signals as input, not just image data. 3. **Design experiments**: Conduct experiments using three different datasets, and the results show that TSViT can still achieve high - precision fault diagnosis without using any pre - processing techniques. ### Method overview: - **Embedding layer**: Includes time - series patch embedding, class token, and position embedding. - **Transformer encoder layer**: Consists of Multi - head Self - Attention, Multi - Layer Perceptron (MLP), Residual Connection, and Layer Normalization. - **Classification layer**: Converts the feature maps extracted by the Transformer encoder into one - hot encoding for pattern recognition. ### Experimental results: - **PBR dataset**: The loss functions of the training set and the test set gradually stabilize and finally approach 0; the accuracy rates of the training set and the test set gradually stabilize and finally approach 100%. - **CWRU dataset**: In 10 trials, the maximum accuracy (MaxAcc) is 100%, the minimum accuracy (MinAcc) is 99.96%, and the average accuracy (AvgAcc) is 99.99%. - **XJTU dataset**: In 10 trials, the maximum accuracy, the minimum accuracy, and the average accuracy are all 100%. ### Performance in a noisy environment: - In actual industrial scenarios, the collected vibration signals usually contain different levels of noise. Research results show that even in a noisy environment, TSViT still performs well, especially when the dataset is large, the influence of noise is smaller. ### Hyper - parameter analysis: - The paper also explores the influence of different hyper - parameter values on model performance, computational requirements, and the number of parameters, further verifying the robustness and effectiveness of TSViT. In conclusion, TSViT effectively solves the limitations of traditional fault diagnosis methods in dealing with vibration signals of rotating machinery by combining convolutional layers and Transformer encoders, and improves the accuracy and efficiency of fault diagnosis.

TSViT: A Time Series Vision Transformer for Fault Diagnosis

A Time Series Transformer based method for the rotating machinery fault diagnosis

A new rotating machinery fault diagnosis method based on the Time Series Transformer

Convolutional Neural Network-Based Transformer Fault Diagnosis Using Vibration Signals

Rotating Machinery Fault Diagnosis Under Multiple Working Conditions via a Time-Series Transformer Enhanced by Convolutional Neural Network

DEViT: Deformable Convolution-Based Vision Transformer for Bearing Fault Diagnosis

Multi-channel Fused Vision Transformer Network for Bearing Fault Diagnosis under Different Working Conditions

A Novel Hierarchical Vision Transformer and Wavelet Time–Frequency Based on Multi-Source Information Fusion for Intelligent Fault Diagnosis

Fault data enhancement and real-time diagnosis using optimized ViT ++ algorithm for electric drive system

Research on a Transformer Vibration Fault Diagnosis Method Based on Time-Shift Multiscale Increment Entropy and CatBoost

Multiscale Time-Frequency Sparse Transformer Based on Partly Interpretable Method for Bearing Fault Diagnosis

Bearing Fault Diagnosis Based on an Enhanced Image Representation Method of Vibration Signal and Conditional Super Token Transformer

Application of Vision-Series Transformer in Screening for Coronary Heart Diseases Using Coronary CT Angiography.

Variational Attention-Based Interpretable Transformer Network for Rotary Machine Fault Diagnosis

A Multi-Information Fusion ViT Model and Its Application to the Fault Diagnosis of Bearing with Small Data Samples

Local feature expansion Vision Transformer model for bearing fault diagnosis under noise environments

CRViT: Vision transformer advanced by causality and inductive bias for image recognition

A transformer model with enhanced feature learning and its application in rotating machinery diagnosis

A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings

HSViT: Horizontally Scalable Vision Transformer

Time-series vision transformer based on cross space-time attention for fault diagnosis in fused deposition modelling with reconstruction of layer-wise data