CMAR-Net: Accurate Cross-Modal 3D SAR Reconstruction of Vehicle Targets with Sparse Multi-Baseline Data

Da Li,Guoqiang Zhao,Houjun Sun,Jiacheng Bao
2025-01-14
Abstract:Multi-baseline Synthetic Aperture Radar (SAR) three-dimensional (3D) tomography is a crucial remote sensing technique that provides 3D resolution unavailable in conventional SAR imaging. However, achieving high-quality imaging typically requires multi-angle or full-aperture data, resulting in significant imaging costs. Recent advancements in sparse 3D SAR, which rely on data from limited apertures, have gained attention as a cost-effective alternative. Notably, deep learning techniques have markedly enhanced the imaging quality of sparse 3D SAR. Despite these advancements, existing methods primarily depend on high-resolution radar images for supervising the training of deep neural networks (DNNs). This exclusive dependence on single-modal data prevents the introduction of complementary information from other data sources, limiting further improvements in imaging performance. In this paper, we introduce a Cross-Modal 3D-SAR Reconstruction Network (CMAR-Net) to enhance 3D SAR imaging by integrating heterogeneous information. Leveraging cross-modal supervision from 2D optical images and error transfer guaranteed by differentiable rendering, CMAR-Net achieves efficient training and reconstructs highly sparse multi-baseline SAR data into visually structured and accurate 3D images, particularly for vehicle targets. Extensive experiments on simulated and real-world datasets demonstrate that CMAR-Net significantly outperforms SOTA sparse reconstruction algorithms based on compressed sensing (CS) and deep learning (DL). Furthermore, our method eliminates the need for time-consuming full-aperture data preprocessing and relies solely on computer-rendered optical images, significantly reducing dataset construction costs. This work highlights the potential of deep learning for multi-baseline SAR 3D imaging and introduces a novel framework for radar imaging research through cross-modal learning.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in multi - baseline Synthetic Aperture Radar (SAR) three - dimensional (3D) imaging, how to use sparse multi - baseline data to achieve high - quality 3D SAR reconstruction. Specifically, existing methods rely on high - resolution radar images to supervise the training of Deep Neural Networks (DNNs), which limits the ability to further improve imaging performance and requires a large amount of time and resources for full - aperture data pre - processing. In addition, due to the limitations of single - modal data, it is difficult for existing methods to introduce complementary information from other data sources. To solve these problems, this paper proposes a Cross - Modal 3D SAR Reconstruction Network (CMAR - Net), which significantly improves the quality of 3D SAR imaging by integrating heterogeneous information and using 2D optical images for cross - modal supervision. The following are the main contributions of this paper: 1. **Introduction of cross - modal learning**: For the first time, the concept of cross - modal learning is introduced into SAR 3D reconstruction. Through 2D optical image supervision, the inherent resolution limitations of electromagnetic images are overcome. 2. **Proposal of CMAR - Net**: A network that combines a unique data augmentation strategy and a projection - back - projection module is designed, which enhances robustness and generalization ability. It is trained only with simulated data and can achieve good results on real data without fine - tuning. 3. **Simplification of dataset construction**: Only 2D optical images are required as supervision data, eliminating the need for high - resolution full - aperture data and reducing the cost and complexity of dataset construction. 4. **Excellent experimental results**: Under low Signal - to - Noise Ratio (SNR) and highly sparse angular measurement conditions, CMAR - Net can still significantly improve the quality of 3D target reconstruction. The Peak Signal - to - Noise Ratio (PSNR) is improved by an average of 75.83% and the Structural Similarity Index (SSIM) is improved by 47.85%. ### Formula summary To ensure the correctness and readability of the formulas, the following are some key formulas involved in the paper: - **Volume rendering formula**: \[ C(r)=\int_{t_{n}}^{t_{f}}T(t)\cdot\sigma(r(t))dt \] where \(\sigma(r(t))\) represents the volume density at point \(r(t)\) along the camera ray, and \(T(t)\) is the cumulative transmittance, defined as: \[ T(t)=\exp\left(-\int_{t}^{t_{1}}\sigma(r(u))du\right) \] - **Discrete integral estimation**: \[ \hat{C}(r)=\sum_{i = 1}^{N}T_{i}(1-\exp(-\sigma_{i}\delta_{i})) \] where \[ T_{i}=\exp\left(-\sum_{j = 1}^{i - 1}\sigma_{j}\delta_{i}\right) \] \(\delta_{i}=t_{i + 1}-t_{i}\) represents the distance between adjacent samples. - **Huber loss function**: \[ L_{\text{huber}}(I_{g}^{i},I_{r}^{i},\gamma)= \begin{cases} \frac{1}{2}\left\|I_{g}^{i}-I_{r}^{i}\right\|^{2}&\text{if }\left\|I_{g}^{i}-I_{r}^{i}\right\|\leq\gamma\\ \gamma\left\|I_{g}^{i}-I_{r}^{i}\right\|-\frac{1}{2}\gamma^{2}&\text{otherwise} \end{cases} \] The total loss function is defined as: \[ L=\frac{1}{V}\sum_{i = 0}^{V - 1}L_{\text{huber}}(I_{g}^{i},I_{r}^{i},\gamma) \] Through these improvements, CMAR - Net not only improves the quality of 3D SAR reconstruction but also simplifies the dataset construction process, demonstrating the great potential of cross - modal learning in SAR imaging.