Cross-modal video retrieval method and system based on multi-head self-attention mechanism and storage medium

Qi Shuhan,Wang Xuan,Ding Luo,Zhang Jiajia,Liao Qing,Liu Yang,Xia Wen,Jiang Lin
2021-01-01
Abstract:The invention provides a cross-modal video retrieval method and system based on a multi-head self-attention mechanism and a storage medium, and the cross-modal video retrieval method comprises a videocoding step, a text coding step and a joint embedding step. Semantic information in the training multi-modal data is fully utilized for training, a multi-head self-attention mechanism is introduced,fine interaction in videos and texts is captured, key information of the multi-modal data is selectively concerned to enhance the characterization capability of the model, data semantics are better mined, and the invention has the advantages of being high in practicability and easy to popularize. Consistency of the distances of the data in the original space and the shared subspace is ensured. Theinvention has the beneficial effects that experiments prove that the similarity of the data in the original space can be effectively maintained, and the retrieval accuracy can be improved.
What problem does this paper attempt to address?