Abstract:Complex objects are usually with multiple labels, and can be represented by multiple modal representations, e.g., the complex articles contain text and image information as well as multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, e.g., the article constitutes with variable number of inconsistent text and image instances. Therefore, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. However, M3 learning is facing two main challenges: 1) how to effectively utilize label correlation and 2) how to take advantage of multi-modal learning to process unlabeled instances. To solve these problems, we first propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which considers M3 learning in an end-to-end multi-modal deep network and utilizes consistency principle among different modal bag-level predictions. Based on the M3DN, we learn the latent ground label metric with the optimal transport. Moreover, we introduce the extrinsic unlabeled multi-modal multi-instance data, and propose the M3DNS, which considers the instance-level auto-encoder for single modality and modified bag-level optimal transport to strengthen the consistency among modalities. Thereby M3DNS can better predict label and exploit label correlation simultaneously. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed methods.

Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA.

Dual Enhancement for Multi-Label Learning with Missing Labels

Multi-Modal Multi-Label Semantic Indexing Of Images Based On Hybrid Ensemble Learning

Multi-Modal Multi-Label Semantic Indexing of Images Using Unlabeled Data

M3LA: A Novel Approach Based on Encoder-Decoder with Attention Framework for Multi-modal Multi-label Learning

Weakly-Supervised Multi-view Multi-instance Multi-label Learning

Collaboration based multi-modal multi-label learning

Cascade of Multi-level Multi-instance Classifiers for Image Annotation.

Complex Object Classification

Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning

Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport

Multi-Instance Multi-Label Learning with Application to Scene Classification

Content and Context-Based Multi-Label Image Annotation.

Correlative multi-label multi-instance image annotation

Supervised LDA for Image Annotation

Multi-Modal Multi-Instance Multi-Label Learning with Graph Convolutional Network

Ensemble Multi-Instance Multi-Label Learning Approach for Video Annotation Task

Label distribution for multimodal machine learning

A New multi-instance multi-label learning approach for image and text classification

Automatic image annotation via local multi-label classification

A Multiple Instance Learning Approach to Image Annotation with Saliency Map.