Abstract:Using lightweight models as backbone networks in gaze estimation tasks often results in significant performance degradation. The main reason is that the number of feature channels in lightweight networks is usually small, which makes the model expression ability limited. In order to improve the performance of lightweight models in gaze estimation tasks, a network model named Multitask-Gaze is proposed. The main components of Multitask-Gaze include Unidirectional Convolution (UC), Spatial and Channel Attention (SCA), Global Convolution Module (GCM), and Multi-task Regression Module(MRM). UC not only significantly reduces the number of parameters and FLOPs, but also extends the receptive field and improves the long-distance modeling capability of the model, thereby improving the model performance. SCA highlights gaze-related features and suppresses gaze-irrelevant features. The GCM replaces the pooling layer and avoids the performance degradation due to information loss. MRM improves the accuracy of individual tasks and strengthens the connections between tasks for overall performance improvement. The experimental results show that compared with the State-of-the-art method SUGE, the performance of Multitask-Gaze on MPIIFaceGaze and Gaze360 datasets is improved by 1.71% and 2.75%, respectively, while the number of parameters and FLOPs are significantly reduced by 75.5% and 86.88%.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the performance degradation of lightweight models in the gaze estimation task. Specifically, due to the small number of feature channels, lightweight models have limited expressive ability, which thus affects the performance of the models. In addition, lightweight models usually use depthwise convolution instead of standard convolution, which further reduces cross - channel information fusion and decreases the expressive ability of the models. To improve these problems, the author proposes a new multi - task gaze estimation model - Multitask - Gaze. This model improves the performance of lightweight models in the gaze estimation task by introducing the following modules: 1. **Unidirectional Convolution (UC)**: - Replacing depthwise convolution not only significantly reduces the number of parameters and floating - point operations (FLOPs), but also expands the receptive field of the model and improves the long - distance modeling ability. - The receptive field expansion formula is: \[ \text{Para original} = K_H \times K_W \times C_{\text{in}} \times C_{\text{out}} \] \[ \text{Para UC} = 1 \times K_W \times C_{\text{in}} \times C_{\text{out}} + K_H \times 1 \times C_{\text{in}} \times C_{\text{out}} \] \[ \text{FLOPs original} = (K_H \times K_W \times C_{\text{in}} \times C_{\text{out}}) \times H_{\text{out}} \times W_{\text{out}} \] \[ \text{FLOPs UC} = (1 \times K_W \times C_{\text{in}} \times C_{\text{out}}) \times H_{\text{out}} \times W_{\text{out}} + (K_H \times 1 \times C_{\text{in}} \times C_{\text{out}}) \times H_{\text{out}} \times W_{\text{out}} \] 2. **Spatial and Channel Attention (SCA)**: - Spatial attention realizes global spatial information perception and increases the weight of important information; channel attention realizes information interaction between channels and highlights information related to the line of sight. 3. **Global Convolution Module (GCM)**: - Replacing the pooling layer to avoid performance degradation caused by information loss and further fuse global information. 4. **Multi - task Regression Module (MRM)**: - Improving the accuracy of a single task and strengthening the correlation between tasks, thereby enhancing the overall performance. The experimental results show that the performance of Multitask - Gaze on the MPIIFaceGaze and Gaze360 datasets is improved by 1.71% and 2.75% respectively, while the number of parameters and FLOPs are reduced by 75.5% and 86.88% respectively. This proves that Multitask - Gaze is not only more lightweight but also significantly improves performance.

Multi-task Gaze Estimation Via Unidirectional Convolution

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

Multiview Multitask Gaze Estimation with Deep Convolutional Neural Networks

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

Multi-Person Gaze-Following with Numerical Coordinate Regression

Gaze Estimation Based on the Improved Xception Network

Highly efficient gaze estimation method using online convolutional re-parameterization

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

MUGGLE: MUlti-Stream Group Gaze Learning and Estimation.

LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask

Depth-aware gaze-following via auxiliary networks for robotics

Acceleration of multi-task cascaded convolutional networks

Efficient End-to-End Convolutional Architecture for Point-of-Gaze Estimation

HybridGazeNet: Geometric model guided Convolutional Neural Networks for gaze estimation

EM-Net: Gaze Estimation with Expectation Maximization Algorithm

Fine-grained gaze estimation based on the combination of regression and classification losses

360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

FR-Net:A Light-weight FFT Residual Net For Gaze Estimation

Adaptive gaze estimation based on channel attention mechanism

Gaze Estimation via Modulation-based Adaptive Network with Auxiliary Self-Learning