Abstract:Spiking neural networks (SNNs) have captured apparent interest over the recent years, stemming from neuroscience and reaching the field of artificial intelligence. However, due to their nature SNNs remain far behind in achieving the exceptional performance of deep neural networks (DNNs). As a result, many scholars are exploring ways to enhance SNNs by using learning techniques from DNNs. While this approach has been proven to achieve considerable improvements in SNN performance, we propose another perspective: enhancing the biological plausibility of the models to leverage the advantages of SNNs fully. Our approach aims to propose a brain-like combination of audio-visual signal processing for recognition tasks, intended to succeed in more bio-plausible human-robot interaction applications.

What problem does this paper attempt to address?

This paper aims to solve how to improve the performance of Spiking Neural Networks (SNNs) in multimodal perception tasks by enhancing their biological plausibility, especially for audio - visual signal processing in robotics. Specifically, the paper proposes a new bio - inspired method. By using different encoding schemes to process image and audio inputs, it aims to improve the data representation ability of SNNs, thereby achieving more biologically plausible robot - human interaction applications. ### Main research questions: 1. **Improving the biological plausibility of SNNs**: Although existing SNNs have advantages in energy efficiency, their performance is still far inferior to that of Deep Neural Networks (DNNs). The paper attempts to bridge this gap by enhancing the biological plausibility of SNNs. 2. **Multimodal data processing**: How to effectively combine image and audio inputs to simulate the way the mammalian brain synchronously processes audio - visual information, thereby improving the performance of SNNs in multimodal perception tasks. 3. **Optimizing encoding schemes and learning mechanisms**: Select appropriate encoding schemes and learning mechanisms to fully utilize the advantages of SNNs while ensuring the efficiency and accuracy of the system. ### Solutions: - **Different encoding schemes**: For image inputs, rate - based coding (Rate Coding) is adopted, which directly maps pixel intensity to the firing rate of neurons; for audio inputs, time - to - first spike (Time - to - First Spike, TTFS) coding is used, and an exponential function is utilized to calculate the threshold, converting the input signal into precise time information. - **Bio - inspired learning mechanism**: A spike - timing - dependent plasticity (Spike - Timing - Dependent Plasticity, STDP) mechanism based on time is used to update synaptic weights according to the time difference between the forward and backward neuron firings. For different types of inputs (image and audio), rate - based STDP and time - based STDP are defined respectively. - **Neuron dynamics**: A leaky integrate - and - fire (Leaky Integrate - and - Fire, LIF) neuron model capable of processing time - and rate - encoded inputs is designed to ensure that both inputs can affect neuron dynamics. - **Input mask and bias correction**: The individual performance of image and audio inputs is evaluated separately through mask technology, and the bias term is calculated to optimize the decoding scheme for multimodal inputs. ### Conclusions: Through the above methods, the paper proposes a new bio - inspired method that can effectively improve the performance of SNNs in multimodal perception tasks, especially achieving more natural human - machine interaction in robotics. This research not only helps promote the development of SNNs but also provides a new direction for future research.

Spiking neural networks: Towards bio-inspired multimodal perception in robotics

A Survey of Neuromorphic Computing Based on Spiking Neural Networks

Research Advances and New Paradigms for Biology-inspired Spiking Neural Networks

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

A Review of Recent Advances and Application for Spiking Neural Networks

Speech Emotion Recognition with Early Visual Cross-modal Enhancement Using Spiking Neural Networks.

Deep CovDenseSNN: A Hierarchical Event-Driven Dynamic Framework with Spiking Neurons in Noisy Environment

Biologically Inspired Structure Learning with Reverse Knowledge Distillation for Spiking Neural Networks

Robust Transcoding Sensory Information with Neural Spikes

Event-Based Multimodal Spiking Neural Network with Attention Mechanism

A Modular Spiking Neural Network-Based Neuro-Robotic System for Exploring Embodied Intelligence*

Spiking Neural Networks Based Cortex Like Mechanism: A Case Study for Facial Expression Recognition

Spiking Neural Networks for event-based action recognition: A new task to understand their advantage

A review of learning in biologically plausible spiking neural networks

Spiking Neural Networks -- Part III: Neuromorphic Communications

Recent Advances and New Frontiers in Spiking Neural Networks

Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking Neural networks: from Algorithms to Technology

Spiking neural networks for physiological and speech signals: a review

Spiking neural networks for biomedical signal analysis