Abstract:To address the issues of low image quality and inadequate detail features encountered in current zero-shot style transfer algorithms, we propose a new text-driven image style transfer model. The model first uses CLIP (Contrastive LanguageImage Pre-Training) model to convey the semantic information of text conditions. In addition, We designed a lightweight network, which can quickly express texture information according to text conditions, minimize the similarity cosine distance between the transferred image and the text conditions by CLIP model, and finally obtain the style transfer image. Furthermore, we introduce dual attention mechanism, identity consistency loss, content and style feature loss to make the translated image more vivid and realistic. Extensive experimental results demonstrate that our approach enables the transfer of multiple styles based on text conditions, achieving a broader, more realistic, and faster style transfer compared to existing methods.

A Text-Driven Image Style Transfer Model Based on CLIP and SCBAM