Continual learning for cross-modal image-text retrieval based on domain-selective attention

Rui Yang,Shuang Wang,Yu Gu,Jihui Wang,Yingzhi Sun,Huan Zhang,Yu Liao,Licheng Jiao
DOI: https://doi.org/10.1016/j.patcog.2024.110273
IF: 8
2024-01-21
Pattern Recognition
Abstract:Cross-modal image-text retrieval (CMITR) has been a high-value research topic for more than a decade. In most of the previous studies, the data for all tasks are trained as a single set. However, in reality, a more likely scenario is that the dataset has multiple tasks and trains them in sequence. The consequence is the limited ability to memorize the old task once a new task arrives; in other words, catastrophic forgetting. To solve this issue, this paper proposes a novel continual learning for cross-modal image-text retrieval (CLCMR) method to alleviate catastrophic forgetting. We construct a multilayer domain-selective attention (MDSA) based network to obtain knowledge from task-relevant and domain-specific attention levels. Moreover, a memory factor has been designed to achieve weight regularization, and a novel memory loss function is utilized to constrain MDSA. The extensive experimental results from multiple datasets (Wikipedia, Pascal Sentence, and PKU XMedianet datasets) demonstrate that CLCMR can effectively alleviate catastrophic forgetting and achieve a superior continual learning ability compared with the state-of-the-art methods.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?