Multimodal Data Mining in a Multimedia Database Based on Structured Max Margin Learning

Zhen Guo,Zhongfei Zhang,Eric P. Xing,Christos Faloutsos,Zhongfei (Mark) Zhang
DOI: https://doi.org/10.1145/2742549
IF: 4.157
2016-02-24
ACM Transactions on Knowledge Discovery from Data
Abstract:Mining knowledge from a multimedia database has received increasing attentions recently since huge repositories are made available by the development of the Internet. In this article, we exploit the relations among different modalities in a multimedia database and present a framework for general multimodal data mining problem where image annotation and image retrieval are considered as the special cases. Specifically, the multimodal data mining problem can be formulated as a structured prediction problem where we learn the mapping from an input to the structured and interdependent output variables. In addition, in order to reduce the demanding computation, we propose a new max margin structure learning approach called Enhanced Max Margin Learning (EMML) framework, which is much more efficient with a much faster convergence rate than the existing max margin learning methods, as verified through empirical evaluations. Furthermore, we apply EMML framework to develop an effective and efficient solution to the multimodal data mining problem that is highly scalable in the sense that the query response time is independent of the database scale. The EMML framework allows an efficient multimodal data mining query in a very large scale multimedia database, and excels many existing multimodal data mining methods in the literature that do not scale up at all. The performance comparison with a state-of-the-art multimodal data mining method is reported for the real-world image databases.
computer science, information systems, software engineering
What problem does this paper attempt to address?