An Improved Transformer U-Net-Based Reconstruction Method for Unsupervised Anomaly Detection

Jialong Tang,Zheng Lv,Ying Liu,Jun Zhao
2024-01-01
Abstract:In this paper, we propose a new anomaly detection method U-NET-VIT (U-ViT). We embed a block composed of Vision Transformer (ViT) into U-Net structure and use it as an image reconstructor and image discriminator. In order to make ViT more suitable for the task of anomaly detection, we have made changes to ViT: we have adopted window-based multi-head Self-attention (W-MSA), which can improve the speed of the network for high-resolution images and is more suitable for the detection of object images. In addition, we use Feed-Forward Network integrated with CNN (FFN-C), that is, we add a deep convolutional layer to the Feed-Forward Network (FFN) to further improve the ability of ViT to utilize local context information. The two modified Transformer blocks are called TransRecon blocks. The comparison with other advanced methods proves that the anomaly detection network built based on this method performs well on MVtec AD dataset [5], especially when compared with other Transformer methods.
What problem does this paper attempt to address?