Predicting Task-Driven Attention Via Integrating Bottom-Up Stimulus and Top-Down Guidance

Zhixiong Nan,Jingjing Jiang,Xiaofeng Gao,Sanping Zhou,Weiliang Zuo,Ping Wei,Nanning Zheng
DOI: https://doi.org/10.1109/tip.2021.3113799
IF: 10.6
2021-01-01
IEEE Transactions on Image Processing
Abstract:Task-free attention has gained intensive interest in the computer vision community while relatively few works focus on task-driven attention (TDAttention). Thus this paper handles the problem of TDAttention prediction in daily scenarios where a human is doing a task. Motivated by the cognition mechanism that human attention allocation is jointly controlled by the top-down guidance and bottom-up stimulus, this paper proposes a cognitively-explanatory deep neural network model to predict TDAttention. Given an image sequence, bottom-up features, such as human pose and motion, are firstly extracted. At the same time, the coarse-grained task information and fine-grained task information are embedded as a top-down feature. The bottom-up features are then fused with the top-down feature to guide the model to predict TDAttention. Two public datasets are re-annotated to make them qualified for TDAttention prediction, and our model is widely compared with other models on the two datasets. In addition, some ablation studies are conducted to evaluate the individual modules in our model. Experiment results demonstrate the effectiveness of our model.
What problem does this paper attempt to address?