Temporally Consistent Unpaired Multi-domain Video Translation by Contrastive Learning

Ruiyang Fan,Qiyu Sun,Ruihao Xia,Yang Tang
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650014
2024-01-01
Abstract:Unpaired multi-domain video-to-video translation is an attractive solution for diverse video translation, which has to deal with not only unpaired data but also spatio-temporal inconsistency. Most current video-to-video translation models based on cycle consistency introduce optical flow as motion information to achieve spatio-temporal consistency. However, the warping of the optical flow generates a meaningless invisible regions outside the field of view, which is produced by stretching the edge area of the original image and negatively affects the model training. In this work, we propose the Contrastive learning for Multi-domain Video-to-video Translation to replace the cycle consistency, in order to avoid the affects of invisible regions. Specifically, we first introduce synthetic optical flow to maintain spatio-temporal consistency. Then, we use the attention mechanism as the selection principle of positive and negative, and eliminate the features of invisible regions by sorting the feature entropy. Frequency domain information is also used to maintain individual consistency. Experiments on the public datasets Viper and INIT show that our methods is universal across multiple datasets and achieves state-of-the-art performance in generating temporally consistent multi-domain videos.
What problem does this paper attempt to address?