Incorporation of multi-dimensional binocular perceptual characteristics to detect stereoscopic video saliency
Yang Zhou,Yongjian He,Xianghong Tang,Yu Lu,Gangyi Jiang
DOI: https://doi.org/10.11834/jig.20170304
2017-01-01
Journal of Image and Graphics
Abstract:Objective Stereoscopic three-dimensional (3D) video services,which aim to provide realistic and immersive experiences,have gained considerable acceptance and interest.Visual saliency detection can automatically predict,locate,and identify important visual information,as well as help machines to effectively filter valuable information from high-volume multimedia data.Saliency detection models axe widely studied for static or dynamic 2D scenes.However,the saliency problem of stereoscopic 3D videos has received less attention.Moreover,few studies are related to dynamic 3D scenes.Given that 3D characteristics,such as depth and visual fatigue,affect the visual attention of humans,the saliency models of static or dynamic 21D scenes are not directly applicable for 3D scenes.To address the gap in the literature,we propose a novel model for 3D salient region detection in stereoscopic videos.The model utilizes multi-dimensional,perceptual,and binocular characteristics.Method The proposed model computes the visual salient region for stereoscopic videos from spatial,depth,and temporal domains of stereoscopic videos.The proposed algorithm is partitioned into four blocks:the measures of spatial,depth,temporal (motion) saliency,and fusion of the three conspicuity maps.In the spatial saliency module,the algorithm considers the spatial saliency in each frame of videos as a visual attention dimension.The Bayesian probabilistic framework is adopted to calculate the 2D static conspicuity map.The spatial saliency in the framework emerges naturally as self-information of visual features.These visual features are obtained from the spatial natural statistics of each stereoscopic 3D video frame rather than from a single test frame.In the depth saliency module,the algorithm considers depth as an additional visual attention dimension.Depth signals have specific characteristics that differ from those of natural signals.Therefore,the measure of depth saliency is derived from depth-perception characteristics.The model extracts the foreground saliency from a disparity map,which is combined with depth contrast to generate a depth conspicuity map.In the motion (temporal) saliency module,the algorithm considers motion as another visual dimension.The optical flow algorithm is applied to acquire the inter-frame motion information between adjacent frames.To reduce the computational complexity of optical flow algorithms,the model first extracts the salient region of the current frame in accordance with the previously obtained spatial conspicuity map and depth conspicuity map.The Lucas-Kanade optical flow algorithm is adopted to calculate the motion characteristics between local salient regions of adjacent frames,and the motion conspicuity map is produced by the regional motion vector map.In the fusion step,a new pooling approach is developed to combine the three conspicuity maps to obtain the final saliency map for stereoscopic 3D videos.This fusion approach is based on the principle that human visual systems simultaneously focus on a unique salient region and divert attention to several salient regions in a saliency map.To generate the final saliency maps of stereoscopic videos,the proposed approach replaces the conventional average weighted sum for the fusion of different features and uses a fusion method that is based on global-local difference.Result We evaluated the proposed scheme for stereoscopic video sequences with various scenarios.Moreover,we compared the proposed model with five other state-of-the-art saliency detection models.The experimental results indicated that the proposed model is efficient,effective,and has superior precision and recall with an 80% precision and 72% recall rate.Conclusion The proposed model demonstrated its efficiency and effectiveness in saliency detection for stereoscopic videos.The model can be applied to stereoscopic videos or image coding,stereoscopic videos or image quality assessment,and object detection and recognition.