Creating Language-driven Spatial Variations of Icon Images

Xianghao Xu,Aditya Ganeshan,Karl D.D. Willis,Yewen Pu,Daniel Ritchie
2024-05-30
Abstract:Editing 2D icon images can require significant manual effort from designers. It involves manipulating multiple geometries while maintaining the logical or physical coherence of the objects depicted in the image. Previous language driven image editing methods can change the texture and geometry of objects in the image but fail at producing spatial variations, i.e. modifying spatial relations between objects while maintaining their identities. We present a language driven editing method that can produce spatial variations of icon images. Our method takes in an icon image along with a user's editing request text prompt and outputs an edited icon image reflecting the user's editing request. Our method is designed based on two key observations: (1) A user's editing requests can be translated by a large language model (LLM), with help from a domain specific language (DSL) library, into to a set of geometrical constraints defining the relationships between segments in an icon image. (2) Optimizing the affine transformations of the segments with respect to these geometrical constraints can produce icon images that fulfill the editing request and preserve overall physical and logical coherence. Quantitative and qualitative results show that our system outperforms multiple baselines, enabling natural editing of icon images.
Graphics
What problem does this paper attempt to address?
The main problem addressed in this paper is how to achieve language-driven spatial transformations in 2D icon image editing. Existing language-driven image editing methods can alter the texture and geometric shape of objects in an image, but they cannot generate spatial transformations, i.e., modify the relative positional relationships between objects while preserving their identities. The paper proposes a novel language-driven editing method that enables spatial transformations on icon images. This method converts user editing requests into geometric constraints and then optimizes the affine transformations of segments to satisfy these constraints, thus generating new icon images that align with the editing requests. The system relies on a large language model (LLM) and a domain-specific language library to comprehend the editing requests and translate them into geometric operations. Furthermore, as the LLM may not accurately handle all constraints, the paper also introduces a graph-based search method to gradually find additional constraints to maintain the logical and physical consistency of the image scene. Experimental results demonstrate that this method outperforms multiple baselines and allows for natural editing of icon images.