Reservoir Computing Transformer for Image-Text Retrieval

Wenrui Li,Zhengyu Ma,Liang-Jian Deng,Penghong Wang,Jinqiao Shi,Xiaopeng Fan
DOI: https://doi.org/10.1145/3581783.3611758
2023-01-01
Abstract:Although the attention mechanism in transformers has proven successful in image-text retrieval tasks, most transformer models suffer from a large number of parameters. Inspired by brain circuits that process information with recurrent connected neurons, we propose a novel Reservoir Computing Transformer Reasoning Network (RCTRN) for image-text retrieval. The proposed RCTRN employs a two-step strategy to focus on feature representation and data distribution of different modalities respectively. Specifically, we send visual and textual features through a unified meshed reasoning module, which encodes multi-level feature relationships with prior knowledge and aggregates the complementary outputs in a more effective way. The reservoir reasoning network is proposed to optimize memory connections between features at different stages and address the data distribution mismatch problem introduced by the unified scheme. To investigate the significance of the low power dissipation and low bandwidth characteristics of RRN in practical scenarios, we deployed the model in the wireless transmission system, demonstrating that RRN's optimization of data structures also has a certain robustness against channel noise. Extensive experiments on two benchmark datasets, Flickr30K and MS-COCO, demonstrate the superiority of RCTRN in terms of performance and low-power dissipation compared to state-of-the-art baselines.
What problem does this paper attempt to address?