Not Only Generative Art: Stable Diffusion for Content-Style Disentanglement in Art Analysis

Yankun Wu,Yuta Nakashima,Noa Garcia
DOI: https://doi.org/10.1145/3591106.3592262
2023-04-20
Abstract:The duality of content and style is inherent to the nature of art. For humans, these two elements are clearly different: content refers to the objects and concepts in the piece of art, and style to the way it is expressed. This duality poses an important challenge for computer vision. The visual appearance of objects and concepts is modulated by the style that may reflect the author's emotions, social trends, artistic movement, etc., and their deep comprehension undoubtfully requires to handle both. A promising step towards a general paradigm for art analysis is to disentangle content and style, whereas relying on human annotations to cull a single aspect of artworks has limitations in learning semantic concepts and the visual appearance of paintings. We thus present GOYA, a method that distills the artistic knowledge captured in a recent generative model to disentangle content and style. Experiments show that synthetically generated images sufficiently serve as a proxy of the real distribution of artworks, allowing GOYA to separately represent the two elements of art while keeping more information than existing methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the decoupling of content and style in art analysis. Specifically, the content of an art work (such as the depicted objects, figures or scenes) and the style (such as color, composition, shape and other visual forms of expression) are two basic and important elements in art analysis. However, from the perspective of computer vision, the boundary between the two is not clear, which poses a challenge to the in - depth understanding of art works. Traditional methods usually rely on manual annotation to extract a single aspect of an art work (such as content or style), which has limitations in learning semantic concepts and the visual representation of paintings. For this reason, the authors propose the GOYA method, which utilizes the artistic knowledge captured by recent generative models (such as Stable Diffusion) to decouple the content and style of art works. Verified by experiments, the synthetically generated images can serve as an effective proxy for the distribution of real art works, enabling GOYA to represent these two elements of art works separately while retaining more information. This method not only improves the understanding of the content and style of art works, but also provides new tools and perspectives for digital humanities research.