Abstract:Many machine learning problems concern with discovering or associating common patterns in data of multiple views or modalities. Multi-view learning is of the methods to achieve such goals. Recent methods propose deep multi-view networks via adaptation of generic Deep Neural Networks (DNNs), which concatenate features of individual views at intermediate network layers (i.e., fusion layers). In this work, we study the problem of multi-view learning in such end-to-end networks. We take a regularization approach via multi-view learning criteria, and propose a novel, effective, and efficient neuron-wise correlation-maximizing regularizer. We implement our proposed regularizers collectively as a correlation-regularized network layer (CorrReg). CorrReg can be applied to either fully-connected or convolutional fusion layers, simply by replacing them with their CorrReg counterparts. By partitioning neurons of a hidden layer in generic DNNs into multiple subsets, we also consider a multi-view feature learning perspective of generic DNNs. Such a perspective enables us to study deep multi-view learning in the context of regularized network training, for which we present control experiments of benchmark image classification to show the efficacy of our proposed CorrReg. To investigate how CorrReg is useful for practical multi-view learning problems, we conduct experiments of RGB-D object/scene recognition and multi-view based 3D object recognition, using networks with fusion layers that concatenate intermediate features of individual modalities or views for subsequent classification. Applying CorrReg to fusion layers of these networks consistently improves classification performance. In particular, we achieve the new state of the art on the benchmark RGB-D object and RGB-D scene datasets. We make the implementation of CorrReg publicly available.

Common Representation Learning Using Step-based Correlation Multi-Modal CNN

Multi-View Correlated Feature Learning by Uncovering Shared Component.

Deep Multi-View Learning using Neuron-Wise Correlation-Maximizing Regularizers

Deep Multimodal Representation Learning from Temporal Data

Canonical Correlation Guided Deep Neural Network

Multimodal correlation deep belief networks for multi-view classification

Learning Partial Correlation based Deep Visual Representation for Image Classification

Graph Convolution Network Based Representation for Multi-View Multi-Label Learning

Multi-view Emotion Recognition Using Deep Canonical Correlation Analysis

Multi-Modal Retrieval Via Deep Textual-Visual Correlation Learning

Composite Nonlinear Multiset Canonical Correlation Analysis for Multiview Feature Learning and Recognition

Multimodal Learning of Social Image Representation by Exploiting Social Relations

Deep Correlated Joint Network for 2-D Image-Based 3-D Model Retrieval

Laplacian multiset canonical correlations for multiview feature extraction and image recognition

Generalized Multi-view Embedding for Visual Recognition and Cross-modal Retrieval

Nonnegative Constrained Graph Based Canonical Correlation Analysis for Multi-view Feature Learning

Poster Abstract: Representation Learning from Multimodal Sensor Data with Maximally Correlated Autoencoders

Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition.

Cross-media Residual Correlation Learning

Pairwise Decomposition of Image Sequences for Active Multi-view Recognition

Common Subspace Based Low-Rank and Joint Sparse Representation for Multi-view Face Recognition.