An event-based implementation of saliency-based visual attention for rapid scene analysis

Camille Simon Chane,Ernst Niebur,Ryad Benosman,Sio-Hoi Ieng
2024-01-10
Abstract:Selective attention is an essential mechanism to filter sensory input and to select only its most important components, allowing the capacity-limited cognitive structures of the brain to process them in detail. The saliency map model, originally developed to understand the process of selective attention in the primate visual system, has also been extensively used in computer vision. Due to the wide-spread use of frame-based video, this is how dynamic input from non-stationary scenes is commonly implemented in saliency maps. However, the temporal structure of this input modality is very different from that of the primate visual system. Retinal input to the brain is massively parallel, local rather than frame-based, asynchronous rather than synchronous, and transmitted in the form of discrete events, neuronal action potentials (spikes). These features are captured by event-based cameras. We show that a computational saliency model can be obtained organically from such vision sensors, at minimal computational cost. We assess the performance of the model by comparing its predictions with the distribution of overt attention (fixations) of human observers, and we make available an event-based dataset that can be used as ground truth for future studies.
Image and Video Processing,Signal Processing
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the problem of efficiently implementing visual attention mechanisms and applying them to dynamic scene analysis. Specifically: 1. **Visual Attention Mechanism**: - Traditional visual attention models are usually based on frame-based video input, which differs from the processing method of the primate visual system. The paper proposes a computational model based on event-based sensors to process visual input in a manner closer to biological visual systems. 2. **Efficient Computation**: - Event-driven sensors can generate visual saliency maps at extremely low computational costs because they only record information when changes are detected, thereby reducing redundant data. 3. **Dynamic Scene Analysis**: - The paper demonstrates how to achieve visual saliency computation in dynamic scenes using event-driven visual sensors and evaluates its performance by comparing it with the fixation point distribution of human observers. 4. **Dataset Contribution**: - The paper also provides an event-driven dataset to serve as benchmark data for future research. In summary, the paper focuses on how to utilize event-driven visual sensors to achieve efficient and biologically inspired visual attention mechanisms and evaluates their performance in dynamic scene analysis.