Abstract:Many real-world applications involve data from multiple modalities and thus exhibit the viewheterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep coattention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.

From Shared Subspaces to Shared Landmarks: A Robust Multi-Source Classification Approach

Multi-View Correlated Feature Learning by Uncovering Shared Component.

Deep Co-Attention Network for Multi-View Subspace Learning

MOON: A Subspace-Based Multi-Branch Network for Object Detection in Remotely Sensed Images

Multi-task learning for subspace segmentation

Incremental Shared Subspace Learning for Multi-label Classification.

Learning Shared Cross-modality Representation Using Multispectral-LiDAR and Hyperspectral Data

Shared Subspace Learning for Latent Representation of Multi-View Data.

Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach

Visual Landmark Learning Via Attention-Based Deep Neural Networks.

Retargeted Multi-View Feature Learning with Separate and Shared Subspace Uncovering

Scalable Machine Learning Approaches for Neighborhood Classification Using Very High Resolution Remote Sensing Imagery

LoCUS: Learning Multiscale 3D-consistent Features from Posed Images

Transfer Across Completely Different Feature Spaces Via Spectral Embedding

Multi-view Latent Space Learning Based on Local Discriminant Embedding

Combining Deep Learning and Model-Based Methods for Robust Real-Time Semantic Landmark Detection

Multi-source transfer learning based on label shared subspace

Multisensor Land Cover Classification With Sparsely Annotated Data Based on Convolutional Neural Networks and Self-Distillation

Robust Subspace Segmentation by Simultaneously Learning Data Representations and Their Affinity Matrix

Robust Structured Subspace Learning for Data Representation

Graph-Induced Aligned Learning on Subspaces for Hyperspectral and Multispectral Data