Model-guided Multi-path Knowledge Aggregation for Aerial Saliency Prediction

Kui Fu,Jia Li,Yu Zhang,Hongze Shen,Yonghong Tian
DOI: https://doi.org/10.1109/tip.2020.2998977
IF: 10.6
2018-01-01
IEEE Transactions on Image Processing
Abstract:As an emerging vision platform, a drone can look from many abnormalviewpoints which brings many new challenges into the classic vision task ofvideo saliency prediction. To investigate these challenges, this paper proposesa large-scale video dataset for aerial saliency prediction, which consists ofground-truth salient object regions of 1,000 aerial videos, annotated by 24subjects. To the best of our knowledge, it is the first large-scale videodataset that focuses on visual saliency prediction on drones. Based on thisdataset, we propose a Model-guided Multi-path Network (MM-Net) that serves as abaseline model for aerial video saliency prediction. Inspired by the annotationprocess in eye-tracking experiments, MM-Net adopts multiple information paths,each of which is initialized under the guidance of a classic saliency model.After that, the visual saliency knowledge encoded in the most representativepaths is selected and aggregated to improve the capability of MM-Net inpredicting spatial saliency in aerial scenarios. Finally, these spatialpredictions are adaptively combined with the temporal saliency predictions viaa spatiotemporal optimization algorithm. Experimental results show that MM-Netoutperforms ten state-of-the-art models in predicting aerial video saliency.
What problem does this paper attempt to address?