It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

Subhadeep Koley,Ayan Kumar Bhunia,Deeptanshu Sekhri,Aneeshan Sain,Pinaki Nath Chowdhury,Tao Xiang,Yi-Zhe Song
2024-03-12
Abstract:This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of existing diffusion models in controlling sketches when generating images, especially the poor performance in free - hand - drawn abstract sketches. Specifically: 1. **Limitations of existing methods**: Existing sketch - based image - generation models (such as ControlNet, T2I - Adapter, etc.) mainly rely on precise edge maps. However, these models perform poorly when dealing with free - hand - drawn abstract sketches and are prone to producing distorted and unrealistic outputs. In addition, these models usually require detailed text prompts to assist the generation process, which limits their practical application scope. 2. **The failure to fulfill the "what you draw is what you get" promise**: Although some models claim to be able to achieve "what you draw is what you get", in fact, if the sketches drawn by users are not precise or are abstract, the generated images often do not match the users' intentions. 3. **Spatial conditioning problems**: The spatial conditioning methods of existing models directly map the contour features of sketches into the output images, resulting in distortion in the generated images. Adjusting the weight parameters in these models to balance the influence of text and sketches requires manual intervention, and the optimal weights for different sketches are different, making it difficult to find a unified solution. To solve these problems, this paper proposes an **abstract - aware framework** aimed at enabling diffusion models to better understand and process free - hand - drawn abstract sketches, thereby generating high - quality images. Specific goals include: - **Democratizing sketch control**: Enabling simple sketches drawn by amateur users to also generate accurate images, fulfilling the "what you draw is what you get" promise. - **Introducing an abstract - aware mechanism**: By introducing an abstract - aware mechanism, enabling the model to adapt to sketches at different levels of abstraction and improving the quality of the generated images. - **Avoiding dependence on text prompts**: Not relying on text prompts during the inference process and being able to generate images with only simple sketches. Through these improvements, the paper hopes to make sketch control more flexible and user - friendly without sacrificing image quality.