Predicting Eye Fixations Using Convolutional Neural Networks

Nian Liu,Junwei Han,Dingwen Zhang,Shifeng Wen,Tianming Liu
DOI: https://doi.org/10.1109/tnnls.2016.2628878
IF: 14.255
2018-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors. In this paper, we propose a novel computational framework to simultaneously learn these two types of visual features from raw image data using a multiresolution convolutional neural network (Mr-CNN) for predicting eye fixations. The Mr-CNN is directly trained from image regions centered on fixation and non-fixation locations over multiple resolutions, using raw image pixels as inputs and eye fixation attributes as labels. Diverse top-down visual features can be learned in higher layers. Meanwhile bottom-up visual saliency can also be inferred via combining information over multiple resolutions. Finally, optimal integration of bottom-up and top-down cues can be learned in the last logistic regression layer to predict eye fixations. The proposed approach achieves state-of-the-art results over four publically available benchmark datasets, demonstrating the superiority of our work.
What problem does this paper attempt to address?