Proposal with Alignment: A Bi-directional Transformer for 360° Video Viewport Proposal
Yichen Guo,Mai Xu,Lai Jiang,Xin Deng,Jing Zhou,Gaoxing Chen,Leonid Sigal
DOI: https://doi.org/10.1109/tcsvt.2024.3419910
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:People normally watch 360° videos through a head-mounted display, inside which only the content of viewports can be seen. Therefore, viewport proposal, referring to detecting potential viewport candidates, plays an important role in many 360° video processing tasks. In this paper, we advance the viewport proposal by further aligning the predicted viewports across frames for individual subject. This provides a better methodology and a deeper perspective to learn the human perceptual behaviours on 360° videos. Specifically, we first analyze three 360° video datasets and obtain several findings on human consistency, objectness and motion of viewports. Inspired by these findings, we propose a bi-directional transformer approach, named BiT, for 360° video viewport proposal and alignment. Specifically, BiT is composed of a multi-level residual module, a bi-directional encoder-decoder module and a spherical matching module. This way, the viewports can be well proposed and aligned via considering multi-level, bi-directional and non-local information. Moreover, the aligned viewports by BiT are used to refine the viewports and improve viewport proposal accuracy in return. Finally, we validate that our BiT approach is superior on viewport proposal, compared with the state-of-the-art approaches. Besides, the aligned viewports from BiT is verified to be effective in multiple applications, such as saliency prediction, trajectory prediction and perceptual video compression.