Text-guided Multi-Task Image Aesthetic Quality Assessment

Hongtao Yang,Guolong Wang,Yehui Liu,Ping Shi,Xinghui Zhou,Xin Jin
DOI: https://doi.org/10.1145/3688867.3690176
2024-01-01
Abstract:In the realm of image aesthetic quality assessment, additional tagging information, such as scene classification, photographic style, and aesthetic attributes, embodies a wealth of aesthetic connotations. The textual descriptions and visual features constructed from this information often exhibit significant complementarity, providing a more comprehensive perspective for delving into the aesthetic value of images. To fully harness this information to enhance the effectiveness of image aesthetic quality assessment, we have developed a Text-guided Multitask Image Aesthetic Quality Assessment model (TM-IAQA). This model aims to jointly learn the textual descriptions and visual features of images, built upon the foundation of the Contrastive Language-Image Pre-training (CLIP) model, and adopts a multitask learning perspective. Through the transfer of auxiliary aesthetic knowledge, it enhances the performance of image aesthetic quality assessment. TM-IAQA encompasses multitask aesthetic quality assessments, including semantic and stylistic task branches based on CLIP, multi-attribute task branches, and multi-theme classification task branches. Experimental results on three public image aesthetic assessment datasets validate the efficacy and generalization capability of this multimodal model, demonstrating that it can effectively transfer the prior knowledge of the CLIP model in the image domain to the field of aesthetic quality assessment.
What problem does this paper attempt to address?