Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network

Sizhe Zheng,Pan Gao,Peng Zhou,Jie Qin
2024-05-30
Abstract:Style transfer aims to render an image with the artistic features of a style image, while maintaining the original structure. Various methods have been put forward for this task, but some challenges still exist. For instance, it is difficult for CNN-based methods to handle global information and long-range dependencies between input images, for which transformer-based methods have been proposed. Although transformers can better model the relationship between content and style images, they require high-cost hardware and time-consuming inference. To address these issues, we design a novel transformer model that includes only the encoder, thus significantly reducing the computational cost. In addition, we also find that existing style transfer methods may lead to images under-stylied or missing content. In order to achieve better stylization, we design a content feature extractor and a style feature extractor, based on which pure content and style images can be fed to the transformer. Finally, we propose a novel network termed Puff-Net, i.e., pure content and style feature fusion network. Through qualitative and quantitative experiments, we demonstrate the advantages of our model compared to state-of-the-art ones in the literature.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address two main challenges in the field of image style transfer: 1. how to effectively transfer artistic styles while preserving the original image content structure; 2. how to reduce computational costs and improve operational efficiency while ensuring the quality of style transfer. To tackle these issues, the authors propose Puff-Net (Pure Content and Style Feature Fusion Network), a novel Transformer-based style transfer network. The main contributions of Puff-Net include: 1. **Design of an efficient Transformer encoder**: By using only the encoder part of the Transformer to perform the style transfer task, it significantly reduces computational overhead and improves inference speed. 2. **Design of feature extractors**: Two types of feature extractors are designed—content feature extractor and style feature extractor—to separate pure content images and pure style images from the input image, thereby achieving better style transfer effects. 3. **Model performance**: Even with a significant reduction in model capacity, Puff-Net is still able to demonstrate competitive performance compared to existing state-of-the-art models, achieving a good balance between the effectiveness of style transfer and model efficiency. The paper validates the effectiveness of Puff-Net through experiments, including qualitative and quantitative analyses, showing its ability to successfully transfer styles while maintaining content integrity, with high inference speed and low overall loss.