Optical Flow-based Spatiotemporal Sketch for Video Representation: A Novel Framework

Qiyuan Du,Yiping Duan,Zhipeng Xie,Xiaoming Tao,Linsu Shi,Zhijuan Jin
DOI: https://doi.org/10.1109/tcsvt.2023.3349130
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:With the rapid development of multimedia services and the dramatic growth of video data volume, efficient video representation and AI-generated content (AIGC) become critical parts of future multimedia communication systems. Sketch graph is a structured abstraction of key textures in an image, and video sketch graph further exploits the temporal continuity of videos to achieve a sparse representation. Sketch-based representation has potential applications in communication systems for both human subjective perception and machine vision tasks, and provides a new idea for AIGC. However, current video sketch extraction methods rely on human assistance and correction, and cannot be applied to end-to-end communication systems. We design a novel framework for spatiotemporal sketch extraction based on deep learning methods. In the proposed framework, sketch extraction and sparse coding are performed at the sender side using structural and temporal features of the video. The original videos are generatively reconstructed at the receiver side or applied to downstream machine vision tasks. We validate the performance of the proposed method on Cityscapes dataset with different metrics. Experiments show that our proposed framework can be end-to-end adapted to video communication tasks in different scenarios and can achieve efficient video characterization and transmission. Moreover, our proposed method enables sketch-based end-to-end AIGC for video generation.
What problem does this paper attempt to address?