Abstract:In this paper, we investigate the cross-modal material retrieval problem, which permits the user to submit a multimodal query including tactile and auditory modalities, and retrieve the image results of visual modalities. Since multiple significantly different modalities are involved in this process, we encounter more challenges compared with the existing cross-modal retrieval tasks. Our focus is to learn cross-modal representations when the modalities are significantly different and with minimal supervision. A novelty is that we establish a framework that deals with weakly paired multimodal fusion method for heterogenous tactile and auditory modalities and weakly paired cross-modal transfer for visual modality. A structured dictionary learning method with a low rank and common classifier is developed to obtain the modal-invariant representation. Finally, some cross-modal validations on publicly available data sets are performed to show the advantages of the proposed method. Note to Practitioners-Cross-modal retrieval is an important task for industrial intelligence. In this paper, we establish a framework to effectively solve the cross-modal material retrieval problem. In the developed framework, the user may submit a multimodal query including acceleration and sound about an object, and the system may return the most relevant retrieved images. Such a framework may find extensive applications in many fields, because it can be flexible to deal with a multiple-modal query and uses the minimal category label supervision without the need of strong sample pairing information between modalities. Compared with the previous material analysis systems, this paper goes beyond previously proposed surface material classification approaches as it returns an ordered list of perceptually similar surface materials for a query.

Cross-Modal Manifold Learning for Cross-modal Retrieval

Learning Visually Aligned Semantic Graph for Cross-Modal Manifold Matching.

Learning an Image Manifold for Retrieval

Semantic Consistency Hashing for Cross-Modal Retrieval

Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization

Image-based 3D Model Retrieval Using Manifold Learning

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature

Manifold learning through locally linear reconstruction based on Euclidean distance

Manifold Regularized Cross-Modal Embedding for Zero-Shot Learning

Image Retrieval Algorithms Based on Manifold Learning

Geometric Multimodal Learning Based on Local Signal Expansion for Joint Diagonalization

Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval

Adversarial Cross-Modal Retrieval

Full-Space Local Topology Extraction for Cross-Modal Retrieval

Graph Embedding Learning for Cross-Modal Information Retrieval.

Manifold information through neighbor embedding projection for image retrieval

Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval.

Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval.

Manifold-based Feature Point Matching for Multi-Modal Image Registration.

Surface Material Retrieval Using Weakly Paired Cross-Modal Learning