Introduction to the Special Issue on Advanced Approaches for Multiple Instance Learning on Multimedia Applications
Pourya Shamsolmoali,Ruili Wang,A. H. Sadka
DOI: https://doi.org/10.1145/3459603
IF: 4.094
2021-01-01
ACM Transactions on Multimedia Computing Communications and Applications
Abstract:Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, whereas the points inside the bags are named instances and a label is provided for an entire bag. The main distinctiveness of the MIL paradigm is that only the labels of the bags are known, whereas the labels of the instances are unknown. It is not similar to the classical supervised classification approach, where the label of each point is indeed known. This paradigm is gaining interest because it naturally fits various problems and allows leveraging weakly labeled data. The MIL paradigm has been widely applied in multimedia applications, such as image processing, signal processing, text processing, and drug design. However, learning from bags raises some important challenges that are unique to MIL, such as the composition of the bags, the ambiguity of instance labels, and the tasks to be performed. This special issue collects seven papers reporting the recent developments of MIL on multimedia applications. “Multi-Peak Graph-Based Multi-Instance Learning for Weakly Supervised Object Detection” introduces a multi-peak graph-based model for weakly supervised object detection. Specifically, the authors use the instance graph to create relations between proposals, which reinforce the MIL process. In addition, a multi-peak discovery strategy is designed to avoid mislabeling instances. In “A Multiple Sieve Approach Based on Artificial Intelligent Techniques and Correlation Power Analysis,” the authors study the reason for premature convergence and propose a multiple sieve method that overcomes the convergence issues and reduces the number of traces required in correlation power attacks. The authors of “A Multi-Instance Multi-Label Dual Learning Approach for Video Captioning” propose a novel encoder-decoder-reconstructor-based multi-instance multi-label dual learning approach to generate video captions. In “Equivariant Adversarial Network for Image-to-Image Translation,” the authors propose a new framework for the capsule network. In this model, a new designed capsule is assigned to each capsule’s entity and uses a trainable function over a transformation to project the input vector onto these capsules. In this transformation, the role of the prediction is to discover the alignment degree of the input vector with the learned capsules. “A Multi-Agent Feature Selection and Hybrid Classification Model for Parkinson’s Disease Diagnosis” aims at developing a novel model to select the best features from the voice dataset. The algorithm is designed for selecting a set of features that improves the overall performance of prediction models and preventing overfitting that might result from extreme reduction to the features. Moreover, the algorithm aims to reduce the complexity of the prediction, speeds up the training phase, and builds a robust training model.