Spiking neural networks: Towards bio-inspired multimodal perception in robotics

Katerina Maria Oikonomou,Vasiliki Balaska,Konstantinos A. Tsintotas,Christos N. Mavridis,Ioannis Kansizoglou,Antonios Gasteratos
2024-11-21
Abstract:Spiking neural networks (SNNs) have captured apparent interest over the recent years, stemming from neuroscience and reaching the field of artificial intelligence. However, due to their nature SNNs remain far behind in achieving the exceptional performance of deep neural networks (DNNs). As a result, many scholars are exploring ways to enhance SNNs by using learning techniques from DNNs. While this approach has been proven to achieve considerable improvements in SNN performance, we propose another perspective: enhancing the biological plausibility of the models to leverage the advantages of SNNs fully. Our approach aims to propose a brain-like combination of audio-visual signal processing for recognition tasks, intended to succeed in more bio-plausible human-robot interaction applications.
Image and Video Processing
What problem does this paper attempt to address?
This paper aims to solve how to improve the performance of Spiking Neural Networks (SNNs) in multimodal perception tasks by enhancing their biological plausibility, especially for audio - visual signal processing in robotics. Specifically, the paper proposes a new bio - inspired method. By using different encoding schemes to process image and audio inputs, it aims to improve the data representation ability of SNNs, thereby achieving more biologically plausible robot - human interaction applications. ### Main research questions: 1. **Improving the biological plausibility of SNNs**: Although existing SNNs have advantages in energy efficiency, their performance is still far inferior to that of Deep Neural Networks (DNNs). The paper attempts to bridge this gap by enhancing the biological plausibility of SNNs. 2. **Multimodal data processing**: How to effectively combine image and audio inputs to simulate the way the mammalian brain synchronously processes audio - visual information, thereby improving the performance of SNNs in multimodal perception tasks. 3. **Optimizing encoding schemes and learning mechanisms**: Select appropriate encoding schemes and learning mechanisms to fully utilize the advantages of SNNs while ensuring the efficiency and accuracy of the system. ### Solutions: - **Different encoding schemes**: For image inputs, rate - based coding (Rate Coding) is adopted, which directly maps pixel intensity to the firing rate of neurons; for audio inputs, time - to - first spike (Time - to - First Spike, TTFS) coding is used, and an exponential function is utilized to calculate the threshold, converting the input signal into precise time information. - **Bio - inspired learning mechanism**: A spike - timing - dependent plasticity (Spike - Timing - Dependent Plasticity, STDP) mechanism based on time is used to update synaptic weights according to the time difference between the forward and backward neuron firings. For different types of inputs (image and audio), rate - based STDP and time - based STDP are defined respectively. - **Neuron dynamics**: A leaky integrate - and - fire (Leaky Integrate - and - Fire, LIF) neuron model capable of processing time - and rate - encoded inputs is designed to ensure that both inputs can affect neuron dynamics. - **Input mask and bias correction**: The individual performance of image and audio inputs is evaluated separately through mask technology, and the bias term is calculated to optimize the decoding scheme for multimodal inputs. ### Conclusions: Through the above methods, the paper proposes a new bio - inspired method that can effectively improve the performance of SNNs in multimodal perception tasks, especially achieving more natural human - machine interaction in robotics. This research not only helps promote the development of SNNs but also provides a new direction for future research.