What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve real - time portrait stylization on mobile devices, that is, to convert self - portraits into cartoon or anime styles. Specifically, the authors propose a latency - driven differentiable architecture search method to reduce the computational complexity of the generation model while maintaining high - quality generation effects. ### Main problems 1. **High computational complexity**: Traditional image translation models (such as GANs) usually adopt an encoder - decoder design, which has high computational complexity when processing high - resolution images and is difficult to achieve real - time processing on mobile devices. 2. **Unstable training**: The training process of GAN is very difficult and unstable, and it is prone to loss divergence and mode collapse, which makes it difficult for existing compression techniques to be integrated into GAN training and maintain the generation quality. ### Solutions To solve the above problems, the authors propose a compiler - aware differentiable architecture search framework, and the main contributions include: 1. **Latency - driven differentiable architecture search**: - Optimize the width and depth of the model by measuring the latency of building blocks and training a neural network to predict latency. - Use the Straight Through Estimator (STE) to sparsify the architecture parameters to {0, 1}, so as to predict the latency in a specific state and ensure that the functionality of the pruned weights can be restored at any time. 2. **Real - time portrait stylization**: - The authors have achieved real - time video stylization on smart phones for the first time and achieved efficient inference using mobile GPUs. - In the experiment, they showed a significant reduction in the amount of computation (10 times) while maintaining the generation quality. ### Formula representation - **Adversarial Loss**: \[ L_{\text{gan}}^X=\mathbb{E}_{y \sim Y}[D_Y(y)]+\mathbb{E}_{x \sim X}[(1 - D_Y(G(x)))^2] \] - **Cycle Consistency Loss**: \[ L_{\text{cyc}}=\mathbb{E}_{x \sim X}[|F(G(x)) - x|_1]+\mathbb{E}_{y \sim Y}[|G(F(y)) - y|_1] \] - **Overall GAN objective function**: \[ L=\lambda_1 L_{\text{gan}}^X+\lambda_1 L_{\text{gan}}^Y+\lambda_2 L_{\text{cyc}}+\lambda_3 L_{\text{id}}+\lambda_4 L_{\text{CAM}} \] where \(\lambda_1 = 1\), \(\lambda_2=10\), \(\lambda_3 = 10\), \(\lambda_4 = 1000\) are hyper - parameters that control each loss term. Through these improvements, the authors have successfully achieved efficient real - time portrait stylization on mobile devices, bringing new possibilities for social media applications and other portable smart devices.

Real-Time Portrait Stylization on the Edge

Realtime Fewshot Portrait Stylization Based On Geometric Alignment

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Real-time Facial Animation on Mobile Devices

Animating Portrait Line Drawings from a Single Face Photo and a Speech Signal

Generating Animatable 3D Cartoon Faces from Single Portraits

Real-time Directional Stylization of Images and Videos.

Fast 3D Stylized Gaussian Portrait Generation From a Single Image With Style Aligned Sampling Loss

Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

An adaptive cartoon-like stylization for color video in real time

PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN

PowerNet: Learning-Based Real-Time Power-Budget Rendering

Steadiface: Real-Time Face-Centric Stabilization on Mobile Phones

Non-Photorealistic Rendering In Customizable Styles For Mobile Collaboration

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

Frame Difference-Based Real-Time Video Stylization in Video Calls

Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization

Feature-Based Automatic Portrait Generation System

Cartoon-like Stylization of Video for Real-Time Applications.