Abstract:Photo cropping is a widely used tool in printing industry, photography, and cinematography. Conventional cropping models suffer from the following three challenges. First, the deemphasized role of semantic contents that are many times more important than low-level features in photo aesthetics. Second, the absence of a sequential ordering in the existing models. In contrast, humans look at semantically important regions sequentially when viewing a photo. Third, the difficulty of leveraging inputs from multiple users. Experience from multiple users is particularly critical in cropping as photo assessment is quite a subjective task. To address these challenges, this paper proposes semantics-aware photo cropping, which crops a photo by simulating the process of humans sequentially perceiving semantically important regions of a photo. We first project the local features (graphlets in this paper) onto the semantic space, which is constructed based on the category information of the training photos. An efficient learning algorithm is then derived to sequentially select semantically representative graphlets of a photo, and the selecting process can be interpreted by a path, which simulates humans actively perceiving semantics in a photo. Furthermore, we learn a prior distribution of such active graphlet paths from training photos that are marked as aesthetically pleasing by multiple users. The learned priors enforce the corresponding active graphlet path of a test photo to be maximally similar to those from the training photos. Experimental results show that: 1) the active graphlet path accurately predicts human gaze shifting, and thus is more indicative for photo aesthetics than conventional saliency maps and 2) the cropped photos produced by our approach outperform its competitors in both qualitative and quantitative comparisons.

Clip-Based Composition-Aware Image Cropping

Image Cropping with Composition and Saliency Aware Aesthetic Score Map

ClipCrop: Conditioned Cropping Driven by Vision-Language Model

Image Re-composition Via Regional Content-Style Decoupling.

An End-to-End Neural Network for Image Cropping by Learning Composition from Aesthetic Photos

Aesthetic image cropping meets VLP: Enhancing good while reducing bad

Automatic Image Cropping Using Sparse Coding.

Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition

Image Cropping under Design Constraints

Focusing on your subject: Deep subject-aware image composition recommendation networks

Composition-Aware Image Aesthetics Assessment

Find Beauty in the Rare: Contrastive Composition Feature Clustering for Nontrivial Cropping Box Regression

A composition-oriented aesthetic view recommendation network supervised by the simplified golden ratio theory

View adjustment: helping users improve photographic composition

Repurposing existing deep networks for caption and aesthetic-guided image cropping

Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping.

Enhancing Historical Image Retrieval with Compositional Cues

Learning Subject-Aware Cropping by Outpainting Professional Photos

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Human-centric Image Cropping with Partition-aware and Content-preserving Features

Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study