Predicting Human Scanpaths in Visual Question Answering (Supplementary Materials)

Xianyu Chen,Ming Jiang,Qi Zhao
2021-01-01
Abstract:1) We present additional results to investigate the effects of hyperparameters, visual encoder backbones, machine attention mechanisms, and more (Section 2). These results suggest that our method is not only generalizable across multiple tasks, but also flexible to work with different visual encoders and task guidance maps. The results also suggest that our predicted scanpaths can fixate task-relevant objects in both VQA and visual search.
What problem does this paper attempt to address?