Abstract:Many real-world applications involve data from multiple modalities and thus exhibit the viewheterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep coattention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.

3View deep canonical correlation analysis for cross-modal retrieval

Towards Improving Canonical Correlation Analysis for Cross-modal Retrieval.

Deep Canonical Correlation Analysis with Progressive and Hypergraph Learning for Cross-Modal Retrieval

Two-stage deep learning for supervised cross-modal retrieval

Image Retrieval Approach Based on Sparse Canonical Correlation Analysis

Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning

Cross-Modal Subspace Clustering Via Deep Canonical Correlation Analysis.

Deep Co-Attention Network for Multi-View Subspace Learning

A Convex Discriminant Semantic Correlation Analysis for Cross-View Recognition

Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning

Multi-Modal Retrieval Via Deep Textual-Visual Correlation Learning

End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss

Nonnegative Constrained Graph Based Canonical Correlation Analysis for Multi-view Feature Learning

Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction

Exploring Deep Learning for View-Based 3D Model Retrieval

A Dynamic Discriminative Canonical Correlation Analysis via Adaptive Weight Scheme

Rank Canonical Correlation Analysis and Its Application in Visual Search Reranking

Cross-Modal Image Clustering Via Canonical Correlation Analysis

A New Approach to Cross-Modal Retrieval

Modeling Intra- and Inter-Pair Correlation Via Heterogeneous High-Order Preserving for Cross-Modal Retrieval

Variational Autoencoder with CCA for Audio-Visual Cross-Modal Retrieval