Real-Time Portrait Stylization on the Edge

Yanyu Li,Xuan Shen,Geng Yuan,Jiexiong Guan,Wei Niu,Hao Tang,Bin Ren,Yanzhi Wang
DOI: https://doi.org/10.48550/arXiv.2206.01244
2022-06-03
Abstract:In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices. We propose a latency-driven differentiable architecture search method, maintaining realistic generative quality. With our framework, we obtain $10\times$ computation reduction on the generative model and achieve real-time video stylization on off-the-shelf smartphone using mobile GPUs.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve real - time portrait stylization on mobile devices, that is, to convert self - portraits into cartoon or anime styles. Specifically, the authors propose a latency - driven differentiable architecture search method to reduce the computational complexity of the generation model while maintaining high - quality generation effects. ### Main problems 1. **High computational complexity**: Traditional image translation models (such as GANs) usually adopt an encoder - decoder design, which has high computational complexity when processing high - resolution images and is difficult to achieve real - time processing on mobile devices. 2. **Unstable training**: The training process of GAN is very difficult and unstable, and it is prone to loss divergence and mode collapse, which makes it difficult for existing compression techniques to be integrated into GAN training and maintain the generation quality. ### Solutions To solve the above problems, the authors propose a compiler - aware differentiable architecture search framework, and the main contributions include: 1. **Latency - driven differentiable architecture search**: - Optimize the width and depth of the model by measuring the latency of building blocks and training a neural network to predict latency. - Use the Straight Through Estimator (STE) to sparsify the architecture parameters to {0, 1}, so as to predict the latency in a specific state and ensure that the functionality of the pruned weights can be restored at any time. 2. **Real - time portrait stylization**: - The authors have achieved real - time video stylization on smart phones for the first time and achieved efficient inference using mobile GPUs. - In the experiment, they showed a significant reduction in the amount of computation (10 times) while maintaining the generation quality. ### Formula representation - **Adversarial Loss**: \[ L_{\text{gan}}^X=\mathbb{E}_{y \sim Y}[D_Y(y)]+\mathbb{E}_{x \sim X}[(1 - D_Y(G(x)))^2] \] - **Cycle Consistency Loss**: \[ L_{\text{cyc}}=\mathbb{E}_{x \sim X}[|F(G(x)) - x|_1]+\mathbb{E}_{y \sim Y}[|G(F(y)) - y|_1] \] - **Overall GAN objective function**: \[ L=\lambda_1 L_{\text{gan}}^X+\lambda_1 L_{\text{gan}}^Y+\lambda_2 L_{\text{cyc}}+\lambda_3 L_{\text{id}}+\lambda_4 L_{\text{CAM}} \] where \(\lambda_1 = 1\), \(\lambda_2=10\), \(\lambda_3 = 10\), \(\lambda_4 = 1000\) are hyper - parameters that control each loss term. Through these improvements, the authors have successfully achieved efficient real - time portrait stylization on mobile devices, bringing new possibilities for social media applications and other portable smart devices.