MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

Zhong Wang,Zengyu Wan,Han Han,Bohao Liao,Yuliang Wu,Wei Zhai,Yang Cao,Zheng-jun Zha

2024-04-30

Abstract:Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information in response to the variability of eye movements. Specifically, the MambaPupil network is proposed, which consists of the multi-layer convolutional encoder to extract features from the event representations, a bidirectional Gated Recurrent Unit (GRU), and a Linear Time-Varying State Space Module (LTV-SSM), to selectively capture contextual correlation from the forward and backward temporal relationship. Furthermore, the Bina-rep is utilized as a compact event representation, and the tailor-made data augmentation, called as Event-Cutout, is proposed to enhance the model's robustness by applying spatial random masking to the event image. The evaluation on the ThreeET-plus benchmark shows the superior performance of the MambaPupil, which secured the 1st place in CVPR'2024 AIS Event-based Eye Tracking challenge.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of eye tracking based on event cameras. Specifically, the paper proposes solutions to the following challenges: 1. **Target loss due to blinking**: When blinking, the eyelid completely covers the eyeball, making it impossible to capture the pupil state in a short time, and the large number of irrelevant events generated by blinking may cause model prediction fluctuations. 2. **Sparse events when the eye is stationary**: Since event cameras generate signals only when there is a change in brightness, event information is extremely sparse when the eye is stationary, insufficient to support accurate predictions. 3. **Interference from other objects**: Signals generated by objects such as glasses, eyelashes, and iris reflections interfere with pupil tracking, especially glasses, which can cause significant and persistent prediction biases. To address these challenges, the paper proposes a new framework called MambaPupil, which includes the following key components: - **Dual Recurrent Module**: Utilizes bidirectional gated recurrent units (Bi-GRU) to extract contextual temporal information and selectively focuses on effective eye movement stages through a linear time-varying state space module (LTV-SSM). - **Bina-rep**: Converts event data into binary representation, reducing input size and avoiding noise interference. - **Event-Cutout**: Enhances the model's robustness in complex situations by randomly spatially occluding event images. Experimental results show that MambaPupil performs excellently in the ThreeET-plus benchmark, particularly in challenging scenarios such as blinking, rapid eye movements, and stationary eyes, achieving accurate and stable tracking effects. Additionally, this method also outperforms existing methods in terms of parameter size and computational load.

MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

A Multi-Scale Recurrent Framework for Motion Segmentation With Event Camera

MambaEVT: Event Stream based Visual Object Tracking using State Space Model

3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network

A Framework for Pupil Tracking with Event Cameras

Mamba-FETrack: Frame-Event Tracking via State Space Model

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

E-Gaze: Gaze Estimation with Event Camera

Characterising Eye Movement Events With Multi-Scale Spatio-Temporal Awareness

Event Camera-Based Pupil Localization: Facilitating Training With Event-Style Translation of RGB Faces

End-to-End Learning of Object Motion Estimation from Retinal Events for Event-Based Object Tracking

Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba

Event-Based Eye Tracking. AIS 2024 Challenge Survey

FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality

EyeTrAES: Fine-grained, Low-Latency Eye Tracking via Adaptive Event Slicing

A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

Bayesian Eye Tracking

FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker

Neuromorphic-Enabled Implementation of Extremely Low-Power Gaze Estimation

gazeNet: End-to-end eye-movement event detection with deep neural networks