Clip-Based Composition-Aware Image Cropping

Shuo Zhang,Xinyu Yang,Xiwen Bai,Yu Li
DOI: https://doi.org/10.1109/icip51287.2024.10647571
2024-01-01
Abstract:Image cropping aims to enhance the aesthetic quality of images by adjusting their composition. Despite previous works have made progress in capturing general aesthetic features, they still exhibit limitations in effectively extracting information related to image composition. Motivated by this, we propose a composition-aware image cropping method in this study. Specifically, we present a CLIP Composition Module, which first transfers visual-language models for image composition understanding. This endows our model with better capability to obtain the image composition features, without relying on any predefined composition rules. We utilize a pair-wise learning-to-rank approach during training, focusing on the crops from the same image to learn more effective composition representations. Moreover, we design a saliency-based method for generating candidate crops, aiming to achieve a broader range of sizes and target diverse objects. Extensive experiments on two benchmark datasets demonstrate the superiority of our model over the other state-of-the-art methods.
What problem does this paper attempt to address?