RaViT-AE: Unsupervised Anomaly Detection for Intelligent Cultural Heritage Monitoring Using Region-Attentive ViT Autoencoder

Dohyung Kwon,Jeongmin Yu
DOI: https://doi.org/10.1109/access.2024.3509988
IF: 3.9
2024-12-11
IEEE Access
Abstract:Unsupervised anomaly detection is well known for its ability to effectively identify and discern anomalies in data containing rare anomalies or diverse patterns, leading to broad applications across various research fields. However, this technology has not yet been extensively applied in the field of cultural heritage monitoring. In response, this paper proposes the RaViT-AE model, a new vision transformer-based autoencoder that implements region-attentive patch projection to perform anomaly detection in images using unsupervised learning techniques. Region-attentive patch projection enhances detection by applying higher-dimensional embeddings to regions of petroglyph images that show a higher likelihood of anomalies, effectively extracting features and recognizing complex patterns. Additionally, the introduction of F-SSIM loss facilitates effective model learning by considering both structural similarities and high-level semantic differences between original and reconstructed images. This study is conducted on a dataset of petroglyph images from Bangudae Terrace in Daegok-ri, Ulju, South Korea, collected from a fixed CCTV camera over more than one year. The results reveal that the proposed RaViT-AE model outperforms previous unsupervised anomaly detection models, including GAN and CNN-based autoencoders, achieving an AUC of 0.976, accuracy of 0.944, and F1-score of 0.936. This study demonstrates that the RaViT-AE model can significantly contribute to the continuous monitoring and protection of cultural heritage by robustly reconstructing images and accurately detecting anomalies.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?