Multimodal Deep Embedding via Hierarchical Grounded Compositional Semantics.

Yueting Zhuang,Jun Song,Fei Wu,Xi Li,Zhongfei Zhang,Yong Rui
DOI: https://doi.org/10.1109/TCSVT.2016.2606648
IF: 5.859
2018-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:For a number of important problems, isolated semantic representations of individual syntactic words or visual objects do not suffice, but instead a compositional semantic representation is required; for example, a literal phrase or a set of spatially concurrent objects. In this paper, we aim to harness the existing image-sentence databases to exploit the compositional nature of image-sentence data...
What problem does this paper attempt to address?