Towards efficient image and video style transfer via distillation and learnable feature transformation

Jing Huo,Meihao Kong,Wenbin Li,Jing Wu,Yu-Kun Lai,Yang Gao
DOI: https://doi.org/10.1016/j.cviu.2024.103947
IF: 4.886
2024-02-03
Computer Vision and Image Understanding
Abstract:Despite the recent rapid development of neural style transfer, existing style transfer methods are still somewhat inefficient or have a large model size, which limits their application on computational resource limited devices. The major problem lies in that they usually adopt a pre-trained VGG-19 backbone which is relatively large or the feature transformation module is computationally heavy. To address above problems, we propose a DIstillation based Style Transfer framework (called DIST) in conjunction with an efficient feature transformation module for arbitrary image and video style transfer. The distillation module can lead to a highly compressed backbone network, which is 15.95 × smaller than the VGG-19 based backbone. The proposed feature transformation is capable of transforming the content features in an extremely efficient feed forward pass. For video style transfer, the above framework is further combined with a temporal consistency regularization loss. Extensive experiments show that the proposed method is superior over the state-of-the-art image and video style transfer methods, even with a much smaller model size.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?