A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

Yan Ru Pei,Sasskia Brüers,Sébastien Crouzet,Douglas McLelland,Olivier Coenen

2024-04-13

Abstract:Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal features, we propose a causal spatiotemporal convolutional network. This solution targets efficient implementation on edge-appropriate hardware with limited resources in three ways: 1) deliberately targets a simple architecture and set of operations (convolutions, ReLU activations) 2) can be configured to perform online inference efficiently via buffering of layer outputs 3) can achieve more than 90% activation sparsity through regularization during training, enabling very significant efficiency gains on event-based processors. In addition, we propose a general affine augmentation strategy acting directly on the events, which alleviates the problem of dataset scarcity for event-based systems. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper primarily aims to address the problem of online eye-tracking using event cameras in edge computing environments, especially in scenarios requiring efficient and low-latency applications. To better handle the data generated by event cameras and leverage their rich temporal features, the authors propose a causal spatiotemporal convolutional network. Specifically, the main contributions of the paper include: 1. **Design of a Lightweight Spatiotemporal Network**: A fully causal lightweight spatiotemporal neural network is designed, capable of efficient online inference on streaming data through a FIFO buffer without the need to store all time frames. 2. **Causal Event Volume Binning Strategy**: A causal event volume binning strategy is proposed to minimize latency and reduce excessive buffering of the event stream during online inference. 3. **Increased Activation Sparsity**: Through L1 regularization during the training process, the sparsity (zero-value output) of each layer's output is significantly increased, exceeding 90%, which helps achieve efficient inference on processors that can exploit this sparsity. 4. **Normalization Strategy**: Alternating BatchNorm and GroupNorm layers are used while maintaining complete causality during inference. The paper applies the proposed model to the AIS 2024 Event-based Eye-tracking Challenge and achieves a 0.9916 p10 accuracy on the Kaggle private test set. Additionally, a series of related works are introduced, including different event binning methods, spatiotemporal networks, and lightweight detection heads. The methods for processing event data, the design of the network architecture, and its configuration for online inference are described in detail. Finally, the paper validates the contributions of different components to the final results through a series of experiments and provides a detailed analysis.

A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

Event-based Object Detection with Lightweight Spatial Attention Mechanism

ECSNet: Spatio-Temporal Feature Learning for Event Camera

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network

Evaluating Image-Based Face and Eye Tracking with Event Cameras

Real-Time Multi-Task Facial Analytics With Event Cameras

Graph-based Asynchronous Event Processing for Rapid Object Recognition

EVtracker: An Event-Driven Spatiotemporal Method for Dynamic Object Tracking

Real-time face & eye tracking and blink detection using event cameras

EDeNN: Event Decay Neural Networks for low latency vision

Asynchronous Spatio-Temporal Memory Network for Continuous Event-Based Object Detection

Spatiotemporal Feature Learning for Event-Based Vision

FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker

End-to-End Learning of Object Motion Estimation from Retinal Events for Event-Based Object Tracking

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Neuromorphic-Enabled Implementation of Extremely Low-Power Gaze Estimation

EvConv: Fast CNN Inference on Event Camera Inputs For High-Speed Robot Perception

Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

Data-driven Feature Tracking for Event Cameras

A Universal Event-Based Plug-In Module for Visual Object Tracking in Degraded Conditions