MStarGAN:a Face Style Transfer Network with Changeable Style Intensity
Liao Yuanhong,Qian Wenhua,Cao Jinde
DOI: https://doi.org/10.11834/jig.221149
2023-01-01
Journal of Image and Graphics
Abstract:Objective The style transfer algorithm can transfer the style from the art image to the original natural image.The style image provides certain features,such as style texture and stroke,while the content image provides the contour structure.The goal of the style transfer algorithm is to synthesize a new stylized image with the texture stroke of the style image and the contour structure of the content image.The early face style transfer algorithm applies mathematical modeling to build a filter that counts the local features of the target image to understand its style.This algorithm then establishes a statistical model to describe the image style.However,the face style transfer algorithm only generates a single style,the resulting image style is not obvious,and needs to be modeled manually,thereby limiting its efficiency.With the rise of deep learning,the style transfer algorithm has started using the deep learning model as its core.Given that generative adversarial network(GAN)can generate images that satisfy certain distribution laws,we can generate a target image that is similar to the real image by training GAN.Therefore,GAN has been widely used in image style transfer algorithms.The main image style transfer algorithms are divided into two categories.The algorithms in the first category only improve GAN without using a pre-encoder,such as pix2pix and CycleGAN,while those in the second category use a pre-encoder.Due to the addition of encoders before the GAN structure,the resulting network structure becomes complex yet achieves highly realistic results,such as StyleGAN and StarGAN.To overcome the shortcomings of some face style transfer algorithms,such as StarGAN and MSGAN,which have poor detail style learning,insignificant style transfer effect,and generation of distorted images,we present a face style migration algorithm called multi-layer StarGAN(MStarGAN)with controllable style intensity.Method First,we construct the pre-encoder through the feature pyramid network(FPN)to generate multi-layer feature vectors containing image detail features.Compared with the original 1 × 64 feature vector,the pre-encoder constructed by FPN can output a 6 × 256 feature vector,which contains additional details of the original image.Therefore,the generated image can learn the detailed style of the style image during style transmission.Second,we use the pre-encoder to generate style vectors for the original and style images and then combine these vectors.We then use the com-bined style vector for style transmission.We can also adjust the number of layers of this vector so that the style of the gener-ated image is biased to either the original or style image,hence resulting in different style transfer intensities for the gener-ated image.Third,we introduce a new loss function to maintain balance in the style of the generated image and ensure that the image will not be too biased toward either the original or style image.Fourth,we apply the weight demodulation algo-rithm as our style transmission module in the generator.The traditional method AdaIN has been proven to distort the gener-ated image.By replacing the normalization operation on the feature map with the operation of convolution weight,we elimi-nate the feature artifacts in the feature map and reduce the distortion in the generated image.Result We implement our model in Python and test it on the Celeba_HQ dataset with RTX2080Ti.Our model not only generates high-quality random face images but also makes the generated images learn the style of style images,including hair and skin color.Compared with the multimodal unsupervised image-to-image translation),diverse image-to-image translation,MSGAN,and StarGAN V2 algorithms,in the latent-guided synthesis experiment,the Frechét inception distance(FID)index of the proposed algo-rithm is reduced by 18.5,39.2,20.2,and 0.8,respectively,while its learned perceptual image patch similarity(LPIPS)index is increased by 0.181,0.366,0.155,and 0.092 respectively.In the reference-guided synthesis experi-ment,the FID index of the proposed algorithm is reduced by 86.4,32.6,18.9,and 3.1,respectively,while its LPIPS index is increased by 0.23,0.095,0.094,and 0.018,respectively.In sum,our algorithm can generate result images with different styles and intensities.Conclusion The proposed algorithm can transmit the detail style of the image,control the style intensity of the output image,and reduce the distortion of the generated image.