Multi-task Gaze Estimation Via Unidirectional Convolution

Zhang Cheng,Yanxia Wang
2024-11-27
Abstract:Using lightweight models as backbone networks in gaze estimation tasks often results in significant performance degradation. The main reason is that the number of feature channels in lightweight networks is usually small, which makes the model expression ability limited. In order to improve the performance of lightweight models in gaze estimation tasks, a network model named Multitask-Gaze is proposed. The main components of Multitask-Gaze include Unidirectional Convolution (UC), Spatial and Channel Attention (SCA), Global Convolution Module (GCM), and Multi-task Regression Module(MRM). UC not only significantly reduces the number of parameters and FLOPs, but also extends the receptive field and improves the long-distance modeling capability of the model, thereby improving the model performance. SCA highlights gaze-related features and suppresses gaze-irrelevant features. The GCM replaces the pooling layer and avoids the performance degradation due to information loss. MRM improves the accuracy of individual tasks and strengthens the connections between tasks for overall performance improvement. The experimental results show that compared with the State-of-the-art method SUGE, the performance of Multitask-Gaze on MPIIFaceGaze and Gaze360 datasets is improved by 1.71% and 2.75%, respectively, while the number of parameters and FLOPs are significantly reduced by 75.5% and 86.88%.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the performance degradation of lightweight models in the gaze estimation task. Specifically, due to the small number of feature channels, lightweight models have limited expressive ability, which thus affects the performance of the models. In addition, lightweight models usually use depthwise convolution instead of standard convolution, which further reduces cross - channel information fusion and decreases the expressive ability of the models. To improve these problems, the author proposes a new multi - task gaze estimation model - Multitask - Gaze. This model improves the performance of lightweight models in the gaze estimation task by introducing the following modules: 1. **Unidirectional Convolution (UC)**: - Replacing depthwise convolution not only significantly reduces the number of parameters and floating - point operations (FLOPs), but also expands the receptive field of the model and improves the long - distance modeling ability. - The receptive field expansion formula is: \[ \text{Para original} = K_H \times K_W \times C_{\text{in}} \times C_{\text{out}} \] \[ \text{Para UC} = 1 \times K_W \times C_{\text{in}} \times C_{\text{out}} + K_H \times 1 \times C_{\text{in}} \times C_{\text{out}} \] \[ \text{FLOPs original} = (K_H \times K_W \times C_{\text{in}} \times C_{\text{out}}) \times H_{\text{out}} \times W_{\text{out}} \] \[ \text{FLOPs UC} = (1 \times K_W \times C_{\text{in}} \times C_{\text{out}}) \times H_{\text{out}} \times W_{\text{out}} + (K_H \times 1 \times C_{\text{in}} \times C_{\text{out}}) \times H_{\text{out}} \times W_{\text{out}} \] 2. **Spatial and Channel Attention (SCA)**: - Spatial attention realizes global spatial information perception and increases the weight of important information; channel attention realizes information interaction between channels and highlights information related to the line of sight. 3. **Global Convolution Module (GCM)**: - Replacing the pooling layer to avoid performance degradation caused by information loss and further fuse global information. 4. **Multi - task Regression Module (MRM)**: - Improving the accuracy of a single task and strengthening the correlation between tasks, thereby enhancing the overall performance. The experimental results show that the performance of Multitask - Gaze on the MPIIFaceGaze and Gaze360 datasets is improved by 1.71% and 2.75% respectively, while the number of parameters and FLOPs are reduced by 75.5% and 86.88% respectively. This proves that Multitask - Gaze is not only more lightweight but also significantly improves performance.