Abstract:Multi-concept image query is a multi-label classification challenge. Traditional query methods focus on single concept query, and only use image visual data without considering the associated textual tag data. In this work, we address the problem of bimodal multi-concept image query, namely retrieving bimodal images with multiple target concepts from the image set. We propose a novel Bimodal Learning for Multi-concept image Query (BLMQ) model based on visual and textual data, which can use the associated textual tags to enhance query performance. Furthermore, for multi-concept query (e.g. a scene query "sky, cloud, sunset"), we directly employ multi-concept detectors to identify the multi-concept in an image instead of heuristic combination of single-concept query techniques. Semantic interdependencies among concepts are leveraged to further improve multi-concept query performance. Since our method models the links among visual content, semantic interdependencies and textual tags, better performance can be achieved. In order to investigate the feasibility and effectiveness of our approach, we conduct experiments on NUS-Wide 260K dataset. The experimental results show that our approach significantly outperforms several state-of-the-art retrieval methods.

Bimodal Learning for Multi-concept Image Query