Abstract:Most of the image-text retrieval methods carry out accurate results using fine-grained features for feature alignment. However, extracting the robustness features while maintaining the retrieval accuracy in wireless communication is still a challenge, especially with channel noises and limited transmission bandwidth. Inspired by spike signals of neurons in the human brain, we propose the neuron-based spiking transmission and reasoning network (NSTRN). In this way, the features are compressed into compacted efficient representations. In NSTRN, we construct the feature sender based on spiking activation function to selectively encode only important information in images and sentences into binary codes, and reduce the transmission cost. Moreover, the feature receiver is designed as a recurrent architecture and applies both temporal attention and global attention blocks to memorize long-term information. Finally, to compensate for the loss of visual concepts in transmission, we use the global textual features as coefficients to guide the formation of visual features in the training stage. The traditional CNN-based joint source-channel coding model outputs float-point encoded features, which requires additional quantization steps to convert features into binary bitstreams in the practical wireless communication system. Instead, the spiking neural networks (SNNs) directly use binary spike trains to reduce the computation complexity caused by the quantization steps. More importantly, SNNs can naturally encode the asynchronous event streams and inhibit the discrete noisy events to extract robust information. Even with binary bitstreams, NSTRN shows effectiveness compared with the state-of-the-art image-text retrieval methods. In the wireless communication scenario, NSTRN not only reduces the transmission bandwidth but also alleviates the "cliff effect" to a certain extent in the traditional separate encoding methods. To the best of our knowledge, this is the first work using SNNs on robust image-text retrieval.

Neuron-Based Spiking Transmission and Reasoning Network for Robust Image-Text Retrieval

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

Robust Transcoding Sensory Information with Neural Spikes

RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

Constructing Lightweight and Efficient Spiking Neural Networks for EEG-based Motor Imagery Classification

VTSNN: a virtual temporal spiking neural network

Deep CovDenseSNN: A Hierarchical Event-Driven Dynamic Framework with Spiking Neurons in Noisy Environment

Motorsrnn: A Spiking Recurrent Neural Network Inspired by Brain Topology for the Effective and Efficient Decoding of Cortical Spike Trains

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

Spike Trains Encoding and Threshold Rescaling Method for Deep Spiking Neural Networks

Sparse Temporal Encoding of Visual Features for Robust Object Recognition by Spiking Neurons

Spike-based Encoding and Learning of Spectrum Features for Robust Sound Recognition.

Codedretrieval: Joint Image Compression and Retrieval with Neural Networks.

Receptive Field-Based All-Optical Spiking Neural Network for Image Processing

Spiking-Diffusion: Vector Quantized Discrete Diffusion Model with Spiking Neural Networks

Spiking Deep Residual Networks.

Multi-Bit Mechanism: A Novel Information Transmission Paradigm for Spiking Neural Networks

CompSNN: A lightweight spiking neural network based on spatiotemporally compressive spike features

Event-driven Spiking Neural Network Based on Membrane Potential Modulation for Remote Sensing Image Classification

Multi-Scale Spiking Pyramid Wireless Communication Framework for Food Recognition