Abstract:Style transfer on images has achieved significant advances in recent years, with the deep convolutional neural network (CNN). Directly applying image style transfer algorithms to each frame of a video independently often leads to flickering and unstable results. In this work, we present a self-supervised space-time convolutional neural network (CNN) based method for online video style transfer, named as VTNet, which is end-to-end trained from nearly unlimited unlabeled video data to produce temporally coherent stylized videos in real-time. Specifically, our VTNet transfer the style of a reference image to the source video frames, which is formed by the temporal prediction branch and the stylizing branch. The temporal prediction branch is used to capture discriminative spatiotemporal features for temporal consistency, pretrained in an adversarial manner from unlabeled video data. The stylizing branch is used to transfer the style image to a video frame with the guidance from the temporal prediction branch to ensure temporal consistency. To guide the training of VTNet, we introduce the style-coherence loss net (SCNet), which assembles the content loss, the style loss, and the new designed coherence loss. These losses are computed based on high-level features extracted from a pretrained VGG-16 network. The content loss is used to preserve high-level abstract contents of the input frames, and the style loss introduces new colors and patterns from the style image. Instead of using optical flow to explicitly redress the stylized video frames, we design the coherence loss to make the stylized video inherit the dynamics and motion patterns from the source video to remove temporal flickering. Extensive subjective and objective evaluations on various styles demonstrate that the proposed method achieves favorable results against the state-of-the-arts with high efficiency.

NLUT: Neural-based 3D Lookup Tables for Video Photorealistic Style Transfer

Correlation-based and Content-Enhanced Network for Video Style Transfer

Artistic Style Transfer with Internal-external Learning and Contrastive Learning

TeSTNeRF: Text-Driven 3D Style Transfer Via Cross-Modal Learning.

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

GLStyleNet: Exquisite Style Transfer Combining Global and Local Pyramid Features

Learning Structure-Aware Transformations for Arbitrary Image Style Transfer

Real-time Localized Photorealistic Video Style Transfer

Style Permutation for Diversified Arbitrary Style Transfer

Universal Photorealistic Style Transfer: A Lightweight and Adaptive Approach

NCST: Neural-based Color Style Transfer for Video Retouching

UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

ColoristaNet for Photorealistic Video Style Transfer

Real-time Arbitrary Video Style Transfer

Neural Preset for Color Style Transfer

Coherent Online Video Style Transfer

Stable Video Style Transfer Based on Partial Convolution with Depth-Aware Supervision

Fast Universal Style Transfer for Artistic and Photorealistic Rendering

Image Neural Style Transfer with Preserving the Salient Regions.

Learning Self-Supervised Space-Time CNN for Fast Video Style Transfer

Structure-Guided Arbitrary Style Transfer for Artistic Image and Video