Mining Ambiguous Data with Multi-instance Multi-label Representation

Zhi-Hua Zhou
DOI: https://doi.org/10.1007/978-3-540-73871-8_1
2007-01-01
Abstract:In traditional data mining and machine learning settings, an object is represented by an instance (or feature vector) which is associated with a class label. However, real-world data are usually ambiguous and an object may be associated with a number of instances and a number of class labels simultaneously. For example, an image usually contains multiple salient regions each can be represented by an instance, while in image classification such an image can belong to several classes such as lion, grasslandand treesimultaneously. Another example is text categorization, where a document usually contains multiple sections each can be represented as an instance, and the document can be regarded as belonging to different categories such as scientific novel, Jules Verne's writingor even books on travellingsimultaneously. Web mining is another example, where each of the links or linked pages can be regarded as an instance while the web page itself can be recognized as a news page, sports page, soccer page, etc. This talk will introduce a new learning framework, multi-instance multi-label learning(MIML), which is a choice in addressing such kind of problems.
What problem does this paper attempt to address?