AesMamba: Universal Image Aesthetic Assessment with State Space Models
Fei Gao,Yuhao Lin,Jiaqi Shi,Maoying Qiao,Nannan Wang
DOI: https://doi.org/10.1145/3664647.3681011
2024-01-01
Abstract:Image Aesthetic Assessment (IAA) aims to objectively predict the generic or personalized evaluations, of the aesthetic or fine-grained multi-attributes, based on visual or multimodal inputs. Previously, researchers have designed diverse and specialized methods, for specific IAA tasks, based on different input-output situations. Is it possible to design a universal IAA framework applicable for the whole IAA task taxonomy? In this paper, we explore this issue, and propose a modular IAA framework, dubbed AesMamba. Specially, we use the Visual State Space Model (VMamba), instead of CNNs or ViTs, to learn comprehensive representations of aesthetic-related attributes; because VMamba can efficiently achieve both global and local effective receptive fields. Afterward, a modal-adaptive module is used to automatically produce the integrated representations, conditioned on the type of input. In the prediction module, we propose a Multitask Balanced Adaptation (MBA) module, to boost task-specific features, with emphasis on the tail instances. Finally, we formulate the personalized IAA task as a multimodal learning problem, by converting a user's anonymous subject characters to a text prompt. This prompting strategy effectively employs the semantics of flexibly selected characters, for inferring individual preferences. AesMamba can be applied to diverse IAA tasks, through flexible combination of these modules. Extensive experiments on numerous datasets, demonstrate that AesMamba consistently achieves superior or competitive performance, on all IAA tasks, in comparison with previous SOTA methods. The code has been released at https://github.com/AiArt-Gao/AesMamba Github.