Harmonizing Stable Diffusion and GPT-4 for Mural Expansion with ArtExtend

Dufeng Chen,Yuqing Yang,Zehua Wang,Zishan Xu,Jueting Liu,Tingting Xu,Wei Chen
DOI: https://doi.org/10.1007/978-981-97-5600-1_39
2024-01-01
Abstract:The Dunhuang murals have a long history. Over the centuries, changes have made it hard to restore them. This study addresses the limitations of existing multimodal learning models in the lack of semantic information on the scene expansion of Dunhuang mural images. A total of 887 groups of Dunhuang mural patterns and their text descriptions were collected for this purpose. A method called ArtExtend was created based on the Stable Diffusion model. It uses BLIP-2 technology to encode keywords in mural images. It uses BLIP-2 technology to encode keywords from mural images and accurately represent mural image features in conjunction with image-text contrast loss. It then optimizes the GPT-4 large model expansion mural keywords through black-box prompt optimization (BPO) technology to generate scene descriptions. Finally, it fine-tunes the Stable Diffusion model and expands Dunhuang mural images with GPT-4 descriptions through LoRA technology. The experimental results show that the proposed method outperforms the benchmark model in terms of LPIPS metric reduction 28.5%, FID metric reduction 30.5%, and CLIPScore score improvement 16.6%. This makes the model better at understanding Dunhuang murals. This study combines deep language understanding and advanced image generation techniques in a new way. This approach reduces the time and cost of traditional restoration work and improves the use of large models in mural painting.
What problem does this paper attempt to address?