Art Authentication with Vision Transformers

Ludovica Schaerf,Carina Popovici,Eric Postma
2023-07-10
Abstract:In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to explore the application and performance of Vision Transformers in the task of authenticating artworks, with a particular focus on the comparison between Swin Transformers and Efficient Networks (EfficientNet). By constructing a carefully selected dataset containing authentic works of Vincent van Gogh and their imitations, the paper evaluates the performance of Swin Transformers and EfficientNet in the authentication of artworks. The study finds that EfficientNet performs best overall in a standard comparison dataset that includes imitations and works of similar style; whereas in a comparison dataset composed solely of imitations, Swin Transformers achieve an authentication accuracy rate of over 85%, surpassing EfficientNet. This indicates that Vision Transformers are a strong and promising contender in the authentication of artworks, especially in enhancing the ability of computers to detect art imitations. The paper begins by introducing the importance and challenges of art attribution and authentication, including the limitations of digital images and the variability of expert knowledge. It then reviews the development of computer-assisted art attribution and authentication techniques, from early visual style learning methods to research based on Convolutional Neural Networks (CNNs), and up to the latest advancements in Vision Transformers. The paper selects ResNet101 and EfficientNet as representatives of CNNs, and two versions of Swin Transformer as Vision Transformers for experiments to assess their performance in the task of art authentication. The experimental section details the construction process of the dataset, including the composition of the van Gogh authentic dataset and two comparison datasets, as well as the specific steps of data preprocessing, enhancement, and model training. The results show that EfficientNet performs best on the standard comparison dataset that includes works by artists with similar styles, while Swin Transformer performs better on the refined comparison dataset composed only of imitations, particularly in distinguishing imitations from authentic works. This suggests that Swin Transformer has an advantage in handling more refined and complex tasks of art authentication. In summary, the paper demonstrates through empirical evidence the potential of Vision Transformers, especially Swin Transformer, in the field of art authentication, providing new tools and methods for computer-assisted art authentication.