Abstract:Large language models (LLMs) have made tremendous progress in natural language understanding and they have also been successfully adopted in other domains such as computer vision, robotics, reinforcement learning, etc. In this work, we apply LLMs to image generation tasks by directly generating the virtual brush strokes to paint an image. We present Painter, an LLM that can convert user prompts in text description format to sketches by generating the corresponding brush strokes in an auto-regressive way. We construct Painter based on off-the-shelf LLM that is pre-trained on a large text corpus, by fine-tuning it on the new task while preserving language understanding capabilities. We create a dataset of diverse multi-object sketches paired with textual prompts that covers several object types and tasks. Painter can generate sketches from text descriptions, remove objects from canvas, and detect and classify objects in sketches. Although this is an unprecedented pioneering work in using LLMs for auto-regressive image generation, the results are very encouraging.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to apply large - language models (LLMs) to image - generation tasks, especially drawing images by generating strokes of virtual paintbrushes. Specifically, the paper introduces the Painter model, an LLM - based system that can automatically generate sketches according to text descriptions. Different from existing image - generation methods, Painter imitates the way humans paint and completes an image by autoregressively generating a series of strokes. ### Main Problems 1. **Applying LLMs to Image Generation**: Although existing image - generation methods have achieved remarkable results, they lack interpretability and it is difficult to solve their inherent flaws. Painter provides a new image - generation method by using LLMs to automatically generate strokes to draw images, which is closer to the process of human painting. 2. **Multi - object Sketch Generation**: Existing datasets such as Quick - Draw only contain sketches of single objects and lack detailed text descriptions. The paper creates a new dataset, Multi - Object - Quick - Draw, which contains sketches of multiple objects and their detailed relationship and position labels, in order to train Painter to generate more complex multi - object sketches. 3. **Multi - task Ability**: Besides generating sketches, Painter can also perform other tasks, such as completing incomplete sketches, removing objects from the canvas, reproducing given sketches, and detecting and classifying objects in sketches. The introduction of these tasks aims to improve the performance of the model on the main task and increase its versatility. ### Solutions 1. **Dataset Construction**: Created the Multi - Object - Quick - Draw dataset, which contains diverse multi - object sketches and their corresponding text descriptions. These sketches not only contain single objects but also include relationship and relative position labels between multiple objects. 2. **Model Design**: Modify the existing pre - trained LLM, add residual cross - attention layers, so that it can handle intertwined inputs of text and image. In addition, introduce a visual feedback loop, enabling the model to monitor the state of the canvas in real - time during the generation process. 3. **Training Method**: Use the standard masked cross - entropy loss function to supervise the training of the model, ensuring that the model can accurately understand the text description and generate corresponding strokes when generating sketches. ### Contributions 1. **First Use of LLMs for Autoregressive Image Generation**: Painter is the first model to use LLMs for autoregressive image generation, pioneering in this field. 2. **Creation of a New Dataset**: The Multi - Object - Quick - Draw dataset contains diverse multi - object sketches and their detailed relationship and position labels, providing rich resources for training complex image - generation models. 3. **Enhanced Visual Grounding**: By introducing a visual feedback loop, cross - attention layers and multi - task training, the performance and interpretability of the model in image - generation tasks are improved. In conclusion, this paper solves the challenges of applying LLMs to image - generation tasks by introducing the Painter model and shows the potential of this method in generating complex multi - object sketches.

Painter: Teaching Auto-regressive Language Models to Draw Sketches

EasyPainter: Customizing Your Own Paintings

AI-Sketcher : A Deep Generative Model for Producing High-Quality Sketches.

SmartPaint: a Co-Creative Drawing System Based on Generative Adversarial Networks

LLMGA: Multimodal Large Language Model based Generation Assistant

Sketch: A Toolkit for Streamlining LLM Operations

LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators

Could ChatGPT Imagine: Content Control for Artistic Painting Generation Via Large Language Models

ProcessPainter: Learn Painting Process from Sequence Data

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition

Creative Painting with Latent Diffusion Models

Re-Thinking Inverse Graphics With Large Language Models

Learning Realistic Sketching: A Dual-agent Reinforcement Learning Approach

Elucidating the design space of language models for image generation

Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level Paintings

PIXAR: Auto-Regressive Language Modeling in Pixel Space

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

A Vision Check-up for Language Models

Line Artist: A Multiple Style Sketch to Painting Synthesis Scheme

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Wasserstein Generative Adversarial Networks