Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval.

Yuting Hu,Liang Zheng,Yi Yang,Yongfeng Huang
DOI: https://doi.org/10.1109/TMM.2017.2760101
IF: 7.3
2018-01-01
IEEE Transactions on Multimedia
Abstract:This paper contributes a new large-scale dataset for weakly supervised cross-media retrieval, named Twitter100k. Current datasets, such as Wikipedia, NUS Wide, and Flickr30k, have two major limitations. First, these datasets are lacking in content diversity, i.e., only some predefined classes are covered. Second, texts in these datasets are written in well-organized language, leading to inconsiste...
What problem does this paper attempt to address?