Abstract:Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, whereas the points inside the bags are named instances and a label is provided for an entire bag. The main distinctiveness of the MIL paradigm is that only the labels of the bags are known, whereas the labels of the instances are unknown. It is not similar to the classical supervised classification approach, where the label of each point is indeed known. This paradigm is gaining interest because it naturally fits various problems and allows leveraging weakly labeled data. The MIL paradigm has been widely applied in multimedia applications, such as image processing, signal processing, text processing, and drug design. However, learning from bags raises some important challenges that are unique to MIL, such as the composition of the bags, the ambiguity of instance labels, and the tasks to be performed. This special issue collects seven papers reporting the recent developments of MIL on multimedia applications. “Multi-Peak Graph-Based Multi-Instance Learning for Weakly Supervised Object Detection” introduces a multi-peak graph-based model for weakly supervised object detection. Specifically, the authors use the instance graph to create relations between proposals, which reinforce the MIL process. In addition, a multi-peak discovery strategy is designed to avoid mislabeling instances. In “A Multiple Sieve Approach Based on Artificial Intelligent Techniques and Correlation Power Analysis,” the authors study the reason for premature convergence and propose a multiple sieve method that overcomes the convergence issues and reduces the number of traces required in correlation power attacks. The authors of “A Multi-Instance Multi-Label Dual Learning Approach for Video Captioning” propose a novel encoder-decoder-reconstructor-based multi-instance multi-label dual learning approach to generate video captions. In “Equivariant Adversarial Network for Image-to-Image Translation,” the authors propose a new framework for the capsule network. In this model, a new designed capsule is assigned to each capsule’s entity and uses a trainable function over a transformation to project the input vector onto these capsules. In this transformation, the role of the prediction is to discover the alignment degree of the input vector with the learned capsules. “A Multi-Agent Feature Selection and Hybrid Classification Model for Parkinson’s Disease Diagnosis” aims at developing a novel model to select the best features from the voice dataset. The algorithm is designed for selecting a set of features that improves the overall performance of prediction models and preventing overfitting that might result from extreme reduction to the features. Moreover, the algorithm aims to reduce the complexity of the prediction, speeds up the training phase, and builds a robust training model.

Video Caption Detection Algorithm Based on Multiple Instance Learning

A Novel Video Caption Detection Approach Using Multi-Frame Integration

Image Classification Algorithm Based on Bag?Level Space Multiple Instance Learning with Sparse Representation

Automatic Caption Location and Extraction in Digital Video Frame Based on SVM and ICA

Human Detection Method Based on Multi-Part Detector and Multi-Instance Learning

Video Face Recognition Based on Modified Fisher Criteria and Multi-instance Learning

Video event detection algorithm based on multi-scale instance learning

Robust Video Identification Approach Based on Local Non-Negative Matrix Factorization

Visual Tracking Via Online Discriminative Multiple Instance Metric Learning

Introduction to the Special Issue on Advanced Approaches for Multiple Instance Learning on Multimedia Applications

Region-Based Image Annotation Using Heuristic Support Vector Machine in Multiple-Instance Learning

Image Annotation by Multiple-Instance Learning with Discriminative Feature Mapping and Selection

A New Multiple Instance Algorithm Using Structural Information.

Caption Location and Extraction in Digital Video Based on SVM

Multiple Instance Learning Tracking Method with Local Sparse Representation

Localizing And Extracting Caption In News Video Using Multi-Frame Average

Dy-MIL: dynamic multiple-instance learning framework for video anomaly detection

A Generic Framework for Video Annotation Via Semi-Supervised Learning.

Learning Frame Relevance for Video Classification

Multiple Instance Learning Using Visual Phrases for Object Classification

CLIP-Driven Multi-Scale Instance Learning for Weakly Supervised Video Anomaly Detection