Steering Large Text-to-Image Model for Abstract Art Synthesis: Preference-based Prompt Optimization and Visualization

Aven-Le Zhou,Wei Wu,Yu-Ao Wang,Kang Zhang
2024-11-18
Abstract:With the advancement of neural generative capabilities, the art community has increasingly embraced GenAI (Generative Artificial Intelligence), particularly large text-to-image models, for producing aesthetically compelling results. However, the process often lacks determinism and requires a tedious trial-and-error process as users often struggle to devise effective prompts to achieve their desired outcomes. This paper introduces a prompting-free generative approach that applies a genetic algorithm and real-time iterative human feedback to optimize prompt generation, enabling the creation of user-preferred abstract art through a customized Artist Model. The proposed two-part approach begins with constructing an Artist Model capable of deterministically generating abstract art in specific styles, e.g., Kandinsky's Bauhaus style. The second phase integrates real-time user feedback to optimize the prompt generation and obtains an Optimized Prompting Model, which adapts to user preferences and generates prompts automatically. When combined with the Artist Model, this approach allows users to create abstract art tailored to their personal preferences and artistic style.
Human-Computer Interaction
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: when using large - scale text - to - image models to generate abstract art, it is difficult for users to obtain results that match their personal preferences by manually writing prompts. Specifically: 1. **Non - determinacy problem**: Current text - to - image generation systems rely on natural language prompts provided by users. However, due to the openness and ambiguity of these prompts, the generated results are often highly random and cannot be guaranteed to meet users' expectations every time. 2. **Challenges in prompt engineering**: Users need to keep trying different prompts to optimize the generation results. This process is cumbersome and lacks guidance, making it difficult for users to efficiently generate satisfactory abstract artworks. 3. **Customization requirements**: Existing large - scale text - to - image models are usually general - purpose and lack the ability to be customized for specific artist styles or user preferences. To solve these problems, this paper proposes a new method that combines "semantic injection" and "prompt optimization with genetic algorithms" to achieve automated generation of abstract art that conforms to user preferences. The specific steps are as follows: - **Semantic injection**: By fine - tuning the pre - trained large - scale text - to - image model, it can generate abstract art in a specific style (for example, Kandinsky's Bauhaus style). This step is achieved through FastLoRA and DiffLoRA techniques, which encode discrete and continuous attribute values into the model. - **Genetic algorithm - based prompt optimization**: Utilize genetic algorithms and real - time user feedback to automatically optimize the prompt generation process. Users can vote to select the generation results that best match their preferences, thereby gradually optimizing the prompts and finally forming an "optimized prompt model". Through the above methods, users can generate abstract artworks that match their personal preferences without manually writing prompts, which greatly improves the controllability and efficiency of the generation process.