Abstract:Leveraging multi-view remote sensing images in scene classification tasks significantly enhances the precision of such classifications. This approach, however, poses challenges due to the simultaneous use of multi-view images, which often leads to a misalignment between the visual content and semantic labels, thus complicating the classification process. In addition, as the number of image viewpoints increases, the quality problem for remote sensing images further limits the effectiveness of multi-view image classification. Traditional scene classification methods predominantly employ SoftMax deep learning techniques, which lack the capability to assess the quality of remote sensing images or to provide explicit explanations for the network's predictive outcomes. To address these issues, this paper introduces a novel end-to-end multi-view decision fusion network specifically designed for remote sensing scene classification. The network integrates information from multi-view remote sensing images under the guidance of image credibility and uncertainty, and when the multi-view image fusion process encounters conflicts, it greatly alleviates the conflicts and provides more reasonable and credible predictions for the multi-view scene classification results. Initially, multi-scale features are extracted from the multi-view images using convolutional neural networks (CNNs). Following this, an asymptotic adaptive feature fusion module (AAFFM) is constructed to gradually integrate these multi-scale features. An adaptive spatial fusion method is then applied to assign different spatial weights to the multi-scale feature maps, thereby significantly enhancing the model's feature discrimination capability. Finally, an evidence decision fusion module (EDFM), utilizing evidence theory and the Dirichlet distribution, is developed. This module quantitatively assesses the uncertainty in the multi-perspective image classification process. Through the fusing of multi-perspective remote sensing image information in this module, a rational explanation for the prediction results is provided. The efficacy of the proposed method was validated through experiments conducted on the AiRound and CV-BrCT datasets. The results show that our method not only improves single-view scene classification results but also advances multi-view remote sensing scene classification results by accurately characterizing the scene and mitigating the conflicting nature of the fusion process.

End-to-end multiview fusion for building mapping from aerial images

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

Model Fusion for Building Type Classification from Aerial and Street View Images

Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving

DeepDualMapper: A Gated Fusion Network for Automatic Map Extraction using Aerial Images and Trajectories

A Multi-View Fusion Method Via Tensor Learning And Gradient Descent For Image Features

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

mVIL-Fusion: Monocular Visual-Inertial-LiDAR Simultaneous Localization and Mapping in Challenging Environments

Multiview Detection with Feature Perspective Transformation

Multi-View Adaptive Fusion Network for 3D Object Detection

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization

Progressive fusion learning: A multimodal joint segmentation framework for building extraction from optical and SAR images

From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion

Ground–Satellite Coupling for Cross-View Geolocation Combined With Multiscale Fusion of Spatial Features

Common Practices and Taxonomy in Deep Multiview Fusion for Remote Sensing Applications

Single-View and Multi-View Depth Fusion

Cascaded Multi-3D-view Fusion for 3D-Oriented Object Detection

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

Multi-View Scene Classification Based on Feature Integration and Evidence Decision Fusion