AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment
Xiangfei Sheng,Leida Li,Pengfei Chen,Jinjian Wu,Weisheng Dong,Yuzhe Yang,Liwu Xu,Yaqian Li,Guangming Shi
DOI: https://doi.org/10.1145/3581783.3611969
2023-01-01
Abstract:Image aesthetics assessment (IAA) aims at predicting the aesthetic quality of images. Recently, large pre-trained vision-language models, like CLIP, have shown impressive performances on various visual tasks. When it comes to IAA, a straightforward way is to finetune the CLIP image encoder using aesthetic images. However, this can only achieve limited success without considering the uniqueness of multimodal data in the aesthetics domain. People usually assess image aesthetics according to fine-grained visual attributes, e.g., color, light and composition. However, how to learn aesthetics-aware attributes from CLIP-based semantic space has not been addressed before. With this motivation, this paper presents a CLIP-based multi-attribute contrastive learning framework for IAA, dubbed AesCLIP. Specifically, AesCLIP consists of two major components, i.e., aesthetic attribute-based comment classification and attribute-aware learning. The former classifies the aesthetic comments into different attribute categories. Then the latter learns an aesthetic attribute-aware representation by contrastive learning, aiming to mitigate the domain shift from the general visual domain to the aesthetics domain. Extensive experiments have been done by using the pre-trained AesCLIP on four popular IAA databases, and the results demonstrate the advantage of AesCLIP over the state-of-the-arts. The source code will be public at https://github.com/OPPOMKLab/AesCLIP.