A Unified Framework for Generalizable Style Transfer: Style and Content Separation

Yexun Zhang,Ya Zhang,Wenbin Cai
DOI: https://doi.org/10.48550/arXiv.1806.05173
2018-06-13
Abstract:Image style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here propose a unified style transfer framework for both character typeface transfer and neural style transfer tasks leveraging style and content separation. A key merit of such framework is its generalizability to new styles and contents. The overall framework consists of style encoder, content encoder, mixer and decoder. The style encoder and content encoder are used to extract the style and content representations from the corresponding reference images. The mixer integrates the above two representations and feeds it into the decoder to generate images with the target style and content. During training, the encoder networks learn to extract styles and contents from limited size of style/content reference images. This learning framework allows simultaneous style transfer among multiple styles and can be deemed as a special `multi-task' learning scenario. The encoders are expected to capture the underlying features for different styles and contents which is generalizable to new styles and contents. Under this framework, we design two individual networks for character typeface transfer and neural style transfer, respectively. For character typeface transfer, to separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. For neural style transfer, we leverage the statistical information of feature maps in certain layers to represent style. Extensive experimental results have demonstrated the effectiveness and robustness of the proposed methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the generalization problem in Image Style Transfer. Most of the existing methods mainly focus on explicitly modeling the transformation between different styles, but these methods usually do not have generalization ability, that is, they cannot handle new styles or contents well. Therefore, the paper proposes a unified style transfer framework, which can be applied to both character font transfer and neural style transfer tasks simultaneously, and improves the generalization ability of the model by separating style and content representations. ### Main contributions 1. **Unified style transfer framework**: The paper proposes a unified framework applicable to character font transfer and neural style transfer, which can learn separate style and content representations. 2. **Generalization ability**: This framework can generate images with new styles or contents given a small number of reference images. 3. **Multi - task learning**: This framework allows style transfer among multiple styles simultaneously and can be regarded as a special "multi - task" learning scenario. 4. **Experimental verification**: Through extensive experimental results, the effectiveness and robustness of the proposed method are proved. ### Framework overview This framework consists of four sub - networks: Style Encoder, Content Encoder, Mixer and Decoder. The specific steps are as follows: 1. **Style Encoder and Content Encoder**: Extract style and content representations from the style reference image and the content reference image respectively. 2. **Mixer**: Combine the style and content representations. 3. **Decoder**: Generate the target image according to the combined representation. ### Specific applications 1. **Character font transfer**: Separate style and content features through conditional dependencies and mix these two factors using a bilinear model. 2. **Neural style transfer**: Represent the style by using the statistical information of the feature maps of specific layers and mix the style and content features through statistical matching. ### Experimental setup - **Data set**: A data set containing 832 fonts (each font contains 1,732 commonly used Chinese characters) is constructed. The image size is 80×80 pixels. 75% of the styles and contents are randomly selected as known styles and contents, and the remaining 25% are regarded as new styles and contents. - **Training and testing**: Select the training set from the images of known styles and contents, and select the test set from different subsets to evaluate the performance under different style transfer challenges. ### Conclusion This paper solves the generalization problem of existing methods in handling new styles and contents by proposing a unified style transfer framework. Through experimental verification, this framework shows good performance and robustness in both character font transfer and neural style transfer tasks.