Hybrid CNN-Transformer based Meta-Learning Approach for Personalized Image Aesthetics Assessment
Xingao Yan,Feng Shao,Hangwei Chen,Qiuping Jiang
DOI: https://doi.org/10.1016/j.jvcir.2023.104044
IF: 2.887
2024-01-01
Journal of Visual Communication and Image Representation
Abstract:Personalized Image Aesthetics Assessment (PIAA) is highly subjective, as people's aesthetic preferences vary greatly. Traditional generic models struggle to capture the unique preferences of each individual, and PIAA often deals with limited samples from individual users. Furthermore, it requires a holistic consideration of diverse visual features in images, including both local and global features. To address these challenges, we propose an innovative network that combines the power of transformer and Convolutional Neural Networks (CNNs) with Meta-Learning for PIAA (TCML-PIAA). Firstly, we leverage both Vision Transformer blocks and CNNs to extract long-term and short-term dependencies, mining richer and heterogeneous aesthetic attributes from these two branches. Secondly, to effectively fuse these distinct features, we introduce an Aesthetic Feature Interaction Module (AFIM), designed to seamlessly integrate the aesthetic features extracted from CNNs and ViT, enabling the interaction and fusion of aesthetic information across different modalities. We also incorporate a Channel-Spatial Attention Module (CSAM), embedding it within both the CNNs and the AFIM to enhance the perception of different regions in images, further exploring the aesthetic cues in images. Experimental results demonstrate that our TCML-PIAA outperforms existing state-of-the-art methods on benchmark databases.
computer science, information systems, software engineering