Conditional Diffusion Model With Spatial-Frequency Refinement for SAR-to-Optical Image Translation
Jiang Qin,Kai Wang,Bin Zou,Lamei Zhang,Joost van de Weijer
DOI: https://doi.org/10.1109/tgrs.2024.3491826
IF: 8.2
2024-11-26
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The presence of speckles and geometric distortions poses a serious challenge to the visual interpretation of synthetic aperture radar (SAR) images. SAR-to-optical (S2O) image translation technology provides a feasible solution and has attracted increasing attention. Restricted by substantial gaps between optical and SAR images, current S2O translation methods unavoidably result in geometric distortions, target missing, and generating low-fidelity images, thereby limiting subsequent cross-modal applications. In this article, we propose an augmented conditional denoising diffusion probabilistic model with spatial-frequency refinement (SFDiff) for high-fidelity S2O image translation. SFDiff progressively narrows the gap between synthesized and real images in both spatial and frequency perspectives, showcasing notable performance in terms of quality and consistency. Specifically, to incorporate rich spatial content priors provided by SAR images, we design an SAR context prior extractor (SCPE) with denoising enhancement to extract multiscale conditional representations, thereby aiding SFDiff in capturing more descriptive cues for S2O translation. In addition, a spatial-frequency complementary learning (SFCL) module is designed to learn spatial semantics and simultaneously enhances informative frequency components and global dependencies. Furthermore, SFDiff is optimized using the joint spatial-frequency refinement loss, facilitating iterative refinement in both spatial and frequency domains to enhance content consistency and fidelity in the synthesized images. Based on the experimental findings from the UNICORN dataset and the SEN12 dataset, SFDiff maintains a high level of content and structural consistency, resulting in visually appealing translation results that surpass the state-of-the-art (SOTA) methods. In particular, SFDiff exhibits excellent performance in preserving small targets and details, which is crucial in cross-modal detection applications.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics