Special Issue on Cross-Modal Retrieval and Analysis
Jianlong Wu,Richang Hong,Qi Tian
DOI: https://doi.org/10.1007/s13735-022-00265-2
2022-01-01
International Journal of Multimedia Information Retrieval
Abstract:With the development of the Internet and social media, a large amount of multimedia data are generated and uploaded every day.Although these multimedia data might have different modalities, such as texts, images, videos, and audio, there is a semantic correlation among them.Effective cross-modal and multi-modal learning imposes great opportunities for many practical applications, such as cross-modal retrieval, matching, recommendation, and classification, which play important roles in public security, social media, entertainment, healthcare, etc.However, due to the natural heterogeneous property of cross-modal data, it is very challenging to investigate the correlation among data of different modalities to deal with practical tasks.This special issue aims to assemble recent advances in cross-modal retrieval and analysis to handle these existing problems and benefit relevant researchers.It is a joint special issue that cooperates with the China Multimedia Conference 2022.We received 36 submissions, and seven papers are selected for publication after at least double peer-review process.We are pleased to present them in the following.In order to investigate the precise inter-modality relationship for cross-modal retrieval tasks, the paper, "Prototype Local-Global Alignment Network for Image-Text Retrieval" by L. Meng, F. Zhang, X. Zhang and C. Xu, presents a novel framework to jointly perform the fine-grained local alignment and high-level global alignment.On the one hand, prototype-based local alignment divides the region-word alignment into the region-prototype and word-prototype alignment, which can well bridge the modality gap and avoid B Richang Hong