Abstract:Style transfer on images has achieved significant advances in recent years, with the deep convolutional neural network (CNN). Directly applying image style transfer algorithms to each frame of a video independently often leads to flickering and unstable results. In this work, we present a self-supervised space-time convolutional neural network (CNN) based method for online video style transfer, named as VTNet, which is end-to-end trained from nearly unlimited unlabeled video data to produce temporally coherent stylized videos in real-time. Specifically, our VTNet transfer the style of a reference image to the source video frames, which is formed by the temporal prediction branch and the stylizing branch. The temporal prediction branch is used to capture discriminative spatiotemporal features for temporal consistency, pretrained in an adversarial manner from unlabeled video data. The stylizing branch is used to transfer the style image to a video frame with the guidance from the temporal prediction branch to ensure temporal consistency. To guide the training of VTNet, we introduce the style-coherence loss net (SCNet), which assembles the content loss, the style loss, and the new designed coherence loss. These losses are computed based on high-level features extracted from a pretrained VGG-16 network. The content loss is used to preserve high-level abstract contents of the input frames, and the style loss introduces new colors and patterns from the style image. Instead of using optical flow to explicitly redress the stylized video frames, we design the coherence loss to make the stylized video inherit the dynamics and motion patterns from the source video to remove temporal flickering. Extensive subjective and objective evaluations on various styles demonstrate that the proposed method achieves favorable results against the state-of-the-arts with high efficiency.

Stable Video Style Transfer Based on Partial Convolution with Depth-Aware Supervision

Correlation-based and Content-Enhanced Network for Video Style Transfer

Artistic Style Transfer with Internal-external Learning and Contrastive Learning

Learning Structure-Aware Transformations for Arbitrary Image Style Transfer

Diverse Image Style Transfer Via Invertible Cross-Space Mapping

Optimal Transport of Deep Feature for Image Style Transfer

Style Permutation for Diversified Arbitrary Style Transfer

Incorporating Multiscale Contextual Loss for Image Style Transfer

Structure-Guided Arbitrary Style Transfer for Artistic Image and Video

Real-time Localized Photorealistic Video Style Transfer

Towards efficient image and video style transfer via distillation and learnable feature transformation

Consistent Video Style Transfer Via Compound Regularization.

Cvstgan: A Controllable Generative Adversarial Network for Video Style Transfer of Chinese Painting

Real-time Arbitrary Video Style Transfer

Learning Self-Supervised Space-Time CNN for Fast Video Style Transfer

UniVST: A Unified Framework for Training-free Localized Video Style Transfer

Style-A-Video: Agile Diffusion for Arbitrary Text-Based Video Style Transfer

Preserving Structural Consistency in Arbitrary Artist and Artwork Style Transfer

Photographic style transfer

Image Neural Style Transfer with Preserving the Salient Regions.

Consistent Video Style Transfer Via Relaxation and Regularization