Cross-modal retrieval by an end to end way

Bang Hu Yin,Xiu Li
DOI: https://doi.org/10.1088/1757-899x/715/1/012082
2020-01-01
IOP Conference Series: Materials Science and Engineering
Abstract:Abstract Cross-modal retrieval has attracted most attention in the recent years. For the image and text, how to measure the semantic similarity among them is still a challenging problem in cross-modal retrieval task. In our paper, we propose a cross-modal retrieval method that uses a CNN to obtain the semantic similarity between image and text. Most methods to solve this problem employ two separate parts for each modality to obtain the semantic similarity between them. However, in our work we just use a CNN to get the semantic similarity without having to use separate networks. In addition, we are aiming to solve the problem between long text and image, so we use the topic model to process the text. We evaluate our approach on Wikepedia dataset.
What problem does this paper attempt to address?