Abstract:In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.

What problem does this paper attempt to address?

The paper aims to explore the application and performance of Vision Transformers in the task of authenticating artworks, with a particular focus on the comparison between Swin Transformers and Efficient Networks (EfficientNet). By constructing a carefully selected dataset containing authentic works of Vincent van Gogh and their imitations, the paper evaluates the performance of Swin Transformers and EfficientNet in the authentication of artworks. The study finds that EfficientNet performs best overall in a standard comparison dataset that includes imitations and works of similar style; whereas in a comparison dataset composed solely of imitations, Swin Transformers achieve an authentication accuracy rate of over 85%, surpassing EfficientNet. This indicates that Vision Transformers are a strong and promising contender in the authentication of artworks, especially in enhancing the ability of computers to detect art imitations. The paper begins by introducing the importance and challenges of art attribution and authentication, including the limitations of digital images and the variability of expert knowledge. It then reviews the development of computer-assisted art attribution and authentication techniques, from early visual style learning methods to research based on Convolutional Neural Networks (CNNs), and up to the latest advancements in Vision Transformers. The paper selects ResNet101 and EfficientNet as representatives of CNNs, and two versions of Swin Transformer as Vision Transformers for experiments to assess their performance in the task of art authentication. The experimental section details the construction process of the dataset, including the composition of the van Gogh authentic dataset and two comparison datasets, as well as the specific steps of data preprocessing, enhancement, and model training. The results show that EfficientNet performs best on the standard comparison dataset that includes works by artists with similar styles, while Swin Transformer performs better on the refined comparison dataset composed only of imitations, particularly in distinguishing imitations from authentic works. This suggests that Swin Transformer has an advantage in handling more refined and complex tasks of art authentication. In summary, the paper demonstrates through empirical evidence the potential of Vision Transformers, especially Swin Transformer, in the field of art authentication, providing new tools and methods for computer-assisted art authentication.

Art Authentication with Vision Transformers

A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis

Synthetic images aid the recognition of human-made art forgeries

A painting authentication method based on multi-scale spatial-spectral feature fusion and convolutional neural network

Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks

Vision Transformers in 2022: An Update on Tiny ImageNet

Improving Vision Transformers by Revisiting High-Frequency Components

Explainable Vision Transformers for Vein Biometric Recognition

A Comprehensive Study of Vision Transformers on Dense Prediction Tasks

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Art Forgery Detection using Kolmogorov Arnold and Convolutional Neural Networks

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Vision Transformer with Sparse Scan Prior

Vision Transformers for Computer Go

Do Vision Transformers See Like Convolutional Neural Networks?

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Three things everyone should know about Vision Transformers

Adventures of Trustworthy Vision-Language Models: A Survey

Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis

Visualization Comparison of Vision Transformers and Convolutional Neural Networks