DATRA-MIV: Decoder-Adaptive Tiling and Rate Allocation for MPEG Immersive Video

Jong-Beom Jeong,Soonbin Lee,Eun-Seok Ryu
DOI: https://doi.org/10.1145/3648371
2024-02-19
Abstract:The emerging immersive video coding standard moving picture experts group (MPEG) immersive video (MIV) which is ongoing standardization by MPEG-Immersive (MPEG-I) group, enables six degrees of freedom (6DoF) in a virtual reality (VR) environment that represents both natural and computer-generated scenes using multi-view video compression. The MIV eliminates the redundancy between multi-view videos and merges the residuals into multiple pictures, called an atlas. Thus, bitstreams with encoded atlases are generated and corresponding number of decoders are needed, which is challenging for the lightweight device with a single decoder. This paper proposes a decoder-adaptive tiling and rate allocation (DATRA) method for MIV to overcome the challenge. First, the proposed method divides atlases into subpictures considering two aspects: (i) subpicture bitstream extracting and merging into one bitstream to use a single decoder, (ii) separation of each source view from the atlases for rate allocation. Second, the atlases are encoded by versatile video coding (VVC), using an extractable subpicture (ES) to divide the atlases into subpictures. Third, each subpicture bitstream is extracted, and asymmetric quality allocation for each subpictures is conducted by considering the residuals in the subpicture. Fourth, mixed-quality subpictures were merged by using the proposed bitstream merger. Fifth, the merged bitstream is decoded by using a single decoder. Finally, the viewing area of the user is synthesized by using the reconstructed atlases. Experimental results with the VVC test model (VTM) show that the proposed method achieves a 21.37% Bjøntegaard delta rate (BD-rate) saving for immersive video peak signal-to-noise ratio (IV-PSNR) and a 26.76% decoding runtime saving compared to the VTM anchor configuration. Moreover, it supports bitstreams for multiple decoders and single decoder without re-encoding, transcoding, or a substantial increase of the server-side storage.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?