Abstract:Vision Transformers (ViTs) with outstanding performance becomes a popular backbone of deep learning models for the main-stream vision tasks including classification, object detection, and segmentation. Other than the performance, reliability is also a critical metric for the adoption of ViTs in safety-critical applications such as autonomous driving and robotics. With the observation that the major computing blocks in ViTs such as multi-head attention and feed forward are usually performed with general matrix multiplication (GEMM), we propose a classical algorithm-based fault tolerance (ABFT) strategy originally developed for GEMM to protect ViTs against soft errors in the underlying computing engines. Unlike classical ABFT that will invoke the expensive error recovery procedure whenever computing errors are detected, we leverage the inherent fault-tolerance of ViTs and propose an approximate ABFT, namely ApproxABFT, to invoke the error recovery procedure only when the computing errors are significant enough, which skips many useless error recovery procedures and simplifies the overall GEMM error recovery. Meanwhile, it also relaxes the error threshold in error recovery procedure and ignores minor computing errors, which reduces the error recovery complexity and improves the error recovery quality. In addition, we also apply a fine-grained blocking strategy to ApproxABFT and split GEMM with distinct sizes into smaller sub blocks such that it can smooth the error thresholds across ViTs and further improve the error recovery quality. According to our experiments, the ApproxABFT reduces the computing overhead by 25.92\% to 81.62\% and improves the model accuracy by 2.63\% to 72.56\% compared to the baseline ABFT while the blocking optimization further reduces the computing overhead by 6.56\% to 73.5\% with comparable accuracy.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the reliability of Vision Transformers (ViTs) in the face of soft errors while reducing the computational overhead brought by traditional Algorithm - Based Fault Tolerance (ABFT). Specifically, the paper focuses on the following points: 1. **Improving the fault - tolerance ability of ViTs**: The performance of ViTs in key applications (such as autonomous driving and robotics) is very excellent, but its computationally intensive characteristics make it vulnerable to soft errors at the hardware level. These soft errors may lead to inaccurate calculation results, which in turn affect the final performance of the model. Therefore, an effective mechanism is needed to protect ViTs from soft errors. 2. **Reducing the computational overhead of ABFT**: Although traditional ABFT strategies can effectively detect and correct calculation errors, their high computational cost limits their wide adoption in practical applications. Especially in the case of a high error rate, the frequent error recovery process will lead to a significant increase in computational overhead. Therefore, the paper proposes an Approximate ABFT (ApproxABFT) method. By relaxing the standards of error detection and recovery and reducing unnecessary error recovery processes, the computational overhead is reduced. 3. **Optimizing the quality of error recovery**: The paper not only focuses on reducing computational overhead but also is committed to improving the quality of error recovery. By introducing a fine - grained blocking strategy, the large - scale matrix multiplication (GEMM) is decomposed into smaller sub - blocks, making the error threshold smoother and further improving the quality of error recovery. ### Main contributions of the paper - **For the first time, analyzed the fault - tolerance ability of ViTs to soft errors**: The paper experimentally analyzed the performance of different layers of ViTs under soft errors and found that most computational deviations have little impact on the model accuracy. - **Proposed the ApproxABFT method**: By increasing the thresholds of error detection and location and ignoring smaller computational errors, unnecessary error recovery processes are reduced, thereby significantly reducing the computational overhead. - **Introduced a fine - grained blocking strategy**: Decomposing GEMM into smaller sub - blocks makes error recovery more efficient and further improves the model accuracy and computational efficiency. - **Experimentally verified the effectiveness of the method**: The experimental results show that compared with the traditional ABFT method, the computational overhead of ApproxABFT is reduced by 25.92% - 81.62%, and the model accuracy is improved by 2.63% - 72.56%. In addition, the blocking optimization further reduces the computational overhead, achieving a reduction of 6.56% - 73.5%, while maintaining comparable accuracy. ### Conclusion By proposing the ApproxABFT method, the paper effectively solves the reliability and computational overhead problems of ViTs in the face of soft errors, providing strong support for the wide adoption of ViTs in key applications.

ApproxABFT: Approximate Algorithm-Based Fault Tolerance for Vision Transformers

Soft Error Reliability Analysis of Vision Transformers

PackQViT: Faster Sub-8-bit Vision Transformers Via Full and Packed Quantization on the Mobile.

ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

SAViT: Structure-Aware Vision Transformer Pruning Via Collaborative Optimization.

TransAxx: Efficient Transformers with Approximate Computing

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs

Super Vision Transformer

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Bearing Fault Diagnosis Based on an Enhanced Image Representation Method of Vibration Signal and Conditional Super Token Transformer

LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

FMViT: A multiple-frequency mixing Vision Transformer

Towards Accurate Post-Training Quantization for Vision Transformer

ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization