Abstract:Many real-world applications involve data from multiple modalities and thus exhibit the viewheterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep coattention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.

Learning to Learn Multiview Detection by Camera-Aware Attention

Multi-View Domain Adaptive Object Detection on Camera Networks.

Multiview Detection with Feature Perspective Transformation

Query-Based Multiview Detection for Multiple Visual Sensor Networks

MVM3Det: A Novel Method for Multi-view Monocular 3D Detection

Deep Co-Attention Network for Multi-View Subspace Learning

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

Learning to Select Camera Views: Efficient Multiview Understanding at Few Glances

Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Structured Knowledge Distillation Towards Efficient Multi-View 3D Object Detection

Multi-View Adaptive Fusion Network for 3D Object Detection

Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Learning Multi-view Anomaly Detection

DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

Multi-View 3D Object Detection Network for Autonomous Driving

Attention-Aware Multi-View Stereo

Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting