CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion

Nisha Huang,Weiming Dong,Yuxin Zhang,Fan Tang,Ronghui Li,Chongyang Ma,Xiu Li,Changsheng Xu
2024-01-30
Abstract:Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images. However, adapting these models for artistic image editing presents two significant challenges. Firstly, users struggle to craft textual prompts that meticulously detail visual elements of the input image. Secondly, prevalent models, when effecting modifications in specific zones, frequently disrupt the overall artistic style, complicating the attainment of cohesive and aesthetically unified artworks. To surmount these obstacles, we build the innovative unified framework CreativeSynth, which is based on a diffusion model with the ability to coordinate multimodal inputs and multitask in the field of artistic image generation. By integrating multimodal features with customized attention mechanisms, CreativeSynth facilitates the importation of real-world semantic content into the domain of art through inversion and real-time style transfer. This allows for the precise manipulation of image style and content while maintaining the integrity of the original model parameters. Rigorous qualitative and quantitative evaluations underscore that CreativeSynth excels in enhancing artistic images' fidelity and preserves their innate aesthetic essence. By bridging the gap between generative models and artistic finesse, CreativeSynth becomes a custom digital palette.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address two key challenges in artistic image editing and generation: 1. **Users find it difficult to create precise text prompts**: Existing large-scale text-to-image generation models can synthesize high-quality images, but in artistic image editing, users find it challenging to describe the visual elements of the input image in detail through text prompts. This makes it difficult for users to accurately express their creativity during artistic creation. 2. **Inconsistent style when modifying specific areas**: Current models often disrupt the overall artistic style when modifying specific areas of an image, resulting in generated images that lack uniformity and aesthetic integrity. This makes it very difficult to perform local modifications while maintaining the overall style and aesthetic consistency of the artwork. To overcome these challenges, the paper proposes an innovative unified framework—CreativeSynth. This framework is based on diffusion models and can coordinate multi-modal inputs, achieving multi-task processing in artistic image generation. By integrating multi-modal features and customized attention mechanisms, CreativeSynth can precisely control the style and content of images while maintaining the integrity of the original model parameters, thereby generating high-fidelity and realistic artistic works. Specifically, the main contributions of CreativeSynth include: - **Introducing a unified artistic framework for multi-modal, multi-task processing**, allowing users to edit any artistic image on a single platform. - **Employing advanced aesthetic maintenance, semantic fusion, and inverse encoding techniques**, ensuring that the intrinsic expression of artistic images is preserved when integrating multi-modal semantic information, significantly improving the coherence of the works on both macro and micro levels, and achieving truly personalized creation. - **Experimental results demonstrate** that CreativeSynth outperforms other existing methods in the field of artistic image fusion and synthesis. Through these technological innovations, CreativeSynth not only enhances the quality of artistic image generation but also provides users with more flexible and precise editing tools, enabling them to achieve personalized creation while maintaining the original style and aesthetic characteristics of the artwork.