MvAV-pix2pixHD: Multi-view Aerial View Image Translation
Jun Yu,Keda Lu,Shenshen Du,Lin Xu,Peng Chang,Houde Liu,Bin Lan,Tianyu Liu
DOI: https://doi.org/10.1109/cvprw63382.2024.00312
2024-01-01
Computer Vision and Pattern Recognition
Abstract:Multi-modal aerial view image translation involves converting aerial images from one modality to another while preserving basic details and features. These modalities encompass Synthetic Aperture Radar (SAR), Infrared (IR), Visible Light (RGB), Electro-Optical (EO), and other image types. Recently, various methods have been proposed to tackle this task, but the focus tends to be on paired image research, overlooking the discrepancies found in aerial images of the same location captured at different times and angles, termed incomplete matching or multi-view image translation. Consequently, we propose MvAV-pix2pixHD to address this issue. For multi-view data sampling, we propose two methods: random sampling and time-priority sampling. Additionally, within the pix2pixHD framework, we introduce an inverse generator to ensure the basic semantic features of the generated images and incorporate three robust loss functions to constrain the authenticity of the generated images. We conduct extensive experiments on two multi-view image translation tasks in the Multi-modal Aerial View Imagery Challenge: Translation (MAVIC-T). Experimental results demonstrate the superiority of our proposed method, and we achieved second place in the MAVIC-T competition in the 20th IEEE Workshop on Perception Beyond the Visible Spectrum of the CVPR 2024.