Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

Bingyao Yu,Wanhua Li,Xiu Li,Jiwen Lu,Jie Zhou
DOI: https://doi.org/10.1109/ICCV48922.2021.00808
2021-01-01
Abstract:In this paper, we propose a Frequency-Aware Spatiotemporal Transformer (FAST) for video inpainting detection, which aims to simultaneously mine the traces of video inpainting from spatial, temporal, and frequency domains. Unlike existing deep video inpainting detection methods that usually rely on hand-designed attention modules and memory mechanism, our proposed FAST have innate global self-attention mechanisms to capture the long-range relations. While existing video inpainting methods usually exploit the spatial and temporal connections in a video, our method employs a spatiotemporal transformer framework to detect the spatial connections between patches and temporal dependency between frames. As the inpainted videos usually lack high frequency details, our proposed FAST synchronously exploits the frequency domain information with a specifically designed decoder. Extensive experimental results demonstrate that our approach achieves very competitive performance and generalizes well.
What problem does this paper attempt to address?