Abstract:Many real-world applications involve data from multiple modalities and thus exhibit the viewheterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep coattention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.

Deep Semisupervised Class- and Correlation-Collapsed Cross-View Learning

Multi-View Correlated Feature Learning by Uncovering Shared Component.

Deep Constrained Low-Rank Subspace Learning for Multi-View Semi-Supervised Classification

Deep Co-Attention Network for Multi-View Subspace Learning

Deep Correlated Predictive Subspace Learning for Incomplete Multi-View Semi-Supervised Classification.

Unsupervised Multiview Nonnegative Correlated Feature Learning for Data Clustering

Multi-view Common Component Discriminant Analysis for Cross-view Classification

Co-Learning Non-Negative Correlated and Uncorrelated Features for Multi-View Data

Self-supervised Correlation Learning for Cross-Modal Retrieval

Deep Semisupervised Multiview Learning with Increasing Views

Sparse Regularized Discriminative Canonical Correlation Analysis for Multi-View Semi-Supervised Learning

Cross-modal correlation learning with deep convolutional architecture

Robust Multi-view Common Component Learning.

Deep Multi-View Subspace Clustering With Unified and Discriminative Learning.

Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning

Semisupervised Cross-Media Retrieval By Distance-Preserving Correlation Learning And Multi-Modal Manifold Regularization

Semantics And Locality Preserving Correlation Projections

Semisupervised Progressive Representation Learning for Deep Multiview Clustering

Deep Multiview Clustering Via Iteratively Self-Supervised Universal and Specific Space Learning

Efficient and Effective Deep Multi-view Subspace Clustering

Two-stage deep learning for supervised cross-modal retrieval