A Domain Robust Approach for Image Dataset Construction.

Yazhou Yao,Xian-Sheng Hua,Fumin Shen,Jian Zhang,Zhenmin Tang
DOI: https://doi.org/10.1145/2964284.2967213
2016-01-01
Abstract:There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the "dataset bias problem". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a "bag" and the retrieved images therein as "instances", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.
What problem does this paper attempt to address?