Geometry Padding for Motion Compensated Prediction in 360 Video Coding
Yuwen He,Yan Ye,Philippe Hanhart,Xiaoyu Xiu
DOI: https://doi.org/10.1109/dcc.2017.18
2017-04-01
Abstract:360 Video has become popular in recent years, as commercial interests in deploying Virtual Reality (VR) applications rise. This type of video is usually captured using multi-camera arrays, such as the GoPro Omni camera rig. After separate video streams are captured from multiple cameras, image stitching is applied to obtain a spherical representation of the scene, which spans 360 degrees horizontally and 180 degrees vertically, hence the name 360 video. In the existing workflow of 360 spherical video coding, the 360 video is projected onto the 2D plane with a projection format, such as equirectangular (ERP), cubemap (CMP), equal-area (EAP), octahedron (OHP), etc. Most, if not all, of the currently available 360 video content are provided in ERP format defined in longitude and latitude. Projection format conversion may be performed to convert the native ERP format to another format before coding is applied. Some projection formats contain more than one face, for example, CMP projects the sphere onto a cube of six faces or OHP projects the sphere onto an octahedron of eight faces. For these multi-face projection formats, the faces are packed onto a 2D rectangular picture with a frame packing method. For example, the six faces of CMP can be packed with 4×3 configuration, or 3×2 configuration. Finally, the frame packed picture is coded as a 2D conventional video. Existing video codecs are designed only considering conventional 2D video captured on a plane. When motion compensated prediction uses any samples outside of a reference picture's boundaries, padding will be performed by simply copying the sample values from the picture boundaries. This repetitive padding method is referred as conventional 2D padding method, which is widely used in video coding standards such as H.264, High Efficiency Video Coding (HEVC). However, a 360 video encompasses video information on the whole sphere, and thus intrinsically has a cyclic property. When considering this cyclic property, the reference pictures of a 360 video no longer have "boundaries", as the information they contain is all wrapped around a sphere. This cyclic property holds regardless of which projection format or which frame packing is used to represent the 360 video on a 2D plane. The paper presents a new geometry padding method for motion compensated prediction in 360 video coding. Unlike the conventional padding method for 2D video coding, the proposed geometry padding method extends samples outside of a 2D picture's boundaries using neighboring samples on the sphere. The geometry projection format is considered when performing padding. The corresponding sample outside of a face's boundary (which may come from another side in the same face or from another face), is derived with rectilinear projection. Each face is extended with geometry padding separately. When visualized, the extended faces using geometry padding show continuous texture representing natural extension of the texture inside the face. The proposed geometry padding method is implemented in the HEVC reference software HM-16.12 for the ERP and CMP projection formats. In the simulation, a total of sixteen 4K ERP video and eight 8K ERP video are used. For 8K ground truth 8K video, they are converted to 4K video in ERP and CMP projection formats, coded, and converted back to reconstructed 8K video in ERP format. For 4K ground truth video, they are directly coded in 4K ERP, or converted to CMP consisting of 75% of effective samples, coded, and converted back to reconstructed 4K video in ERP format. Then, the end-to-end spherical PSNR (S-PSNR) is calculated between the original 8K or 4K and the reconstructed 8K or 4K ERP video. BD-rate is calculated between the reference unmodified HEVC, which uses the conventional padding method, and HEVC modified with the proposed geometry padding method. Simulation results showed that geometry padding performs better. For 8K sequences, the proposed geometry padding gives on average luma (Y) BD-rate reduction of 0.3% for ERP and 0.8% for CMP, for 4K sequences, the proposed geometry padding gives on average Y BD-rate reduction of 0.2% for ERP and 1.0% for CMP. Comparing the gains in ERP format with the gains in CMP format, the improvement for CMP is larger. This is because CMP has six faces, therefore the improvement from geometry padding method affects more out-of-boundary samples. The proposed geometry padding method is also especially effective for sequences with fast motion. For example, it achieves BD rate reductions of 4.3%, 2.7%, 1.9%, and 2.5% for Glacier, Chairlift, Sb_in_lot, and Driving, respectively. These four sequences are all captured using moving cameras and have fast moving objects. As a result, the sequences contain a lot of across-the-face-boundary motion which can benefit from improved padding method. Detailed simulation results can be found in JVET contribution JVET-D0075 available at http://phenix.int-evry.fr/jvet/doc_end_user/documents/4_Chengdu/wg11/JVET-D0075-v3.zip.