Abstract:Video frame transmission delay is critical in real-time applications such as online video gaming, live show, etc. The receiving deadline of a new frame must catch up with the frame rendering time. Otherwise, the system will buffer a while, and the user will encounter a frozen screen, resulting in unsatisfactory user experiences. An effective approach is to transmit frames in lower-quality under poor bandwidth conditions, such as using scalable video coding. In this paper, we propose to enhance video quality using lossy frames in two situations. First, when current frames are too late to receive before rendering deadline (i.e., lost), we propose to use previously received high-resolution images to predict the future frames. Second, when the quality of the currently received frames is low~(i.e., lossy), we propose to use previously received high-resolution frames to enhance the low-quality current ones. For the first case, we propose a small yet effective video frame prediction network. For the second case, we improve the video prediction network to a video enhancement network to associate current frames as well as previous frames to restore high-quality images. Extensive experimental results demonstrate that our method performs favorably against state-of-the-art algorithms in the lossy video streaming environment.

What problem does this paper attempt to address?

This paper attempts to solve the problems of frame loss or frame quality degradation in video stream transmission due to network bandwidth limitations, especially in real - time applications (such as online video games, live broadcasts, etc.). Specifically, the paper focuses on two main issues: 1. **When the current frame fails to be received before the rendering deadline (i.e., frame loss)**: At this time, the system will encounter buffer interruption, and the user will see the screen freeze, resulting in an unsatisfactory user experience. To this end, the author proposes to use previously received high - resolution images to predict future frames. 2. **When the quality of the currently received frame is low (i.e., low - resolution or lossy compression)**: In this case, although the frame can arrive on time, due to the too - low resolution or excessive compression loss, the quality of the frame is poor. The author proposes to use previously received high - resolution frames to enhance the current low - quality frame. To solve the above problems, the author proposes a model named Prediction - ASSistant Network (PASS - Net). This model combines techniques such as optical flow estimation, optical flow propagation, and optical flow fusion, and uses historical high - resolution frames to assist in predicting or restoring the current frame, thereby improving the quality of the video stream. Specifically, PASS - Net works in two cases: - **Frame loss case**: Use previously received high - resolution frames to predict future frames. - **Frame - poor - quality case**: Use previously received high - resolution frames to enhance the current low - quality frame. Through this method, PASS - Net can effectively improve the quality of video streams under low - bandwidth conditions, reduce the possibility of buffer interruption and frame loss, and thus enhance the user experience. ### Main contributions 1. **Propose a video frame super - resolution method based on prediction assistance**: This method explicitly uses previous frames to help predict or restore the current frame, which is helpful for accurately estimating optical flow and providing rich texture details. 2. **Introduce three optical flow fusion mechanisms**: including estimated optical flow, propagated optical flow, and wrapped optical flow, to improve the accuracy of future optical flow estimation, especially in cases where the current optical flow is lost or the current frame has a low resolution. 3. **Show that the proposed model is more effective, efficient, and compact than existing methods**: This has been verified through experiments on multiple benchmark datasets. ### Formula summary - Optical flow propagation formula: \[ \tilde{F}_{0 \to t}=0.5t(t + 1)F_{0 \to - 2}-t(t + 2)F_{0 \to - 1} \] - Optical flow estimation formula: \[ \bar{F}_{0 \to t}=E(I_{-n},\dots,I_0;I_t^{\downarrow s}) \] - Optical flow fusion formula: \[ \hat{F}_{0 \to t}=F(\tilde{F}_{0 \to t},\hat{F}_{0 \to t},\bar{F}_{0 \to t},I_0) \] - Frame synthesis formula: \[ I_t=\varphi(W(I_0;F_{t \to 0}),\text{bic}(I_t^{\downarrow s}),F_{t \to 0};\Theta) \] These formulas ensure the robustness and accuracy of the model in different scenarios.

Prediction-assistant Frame Super-Resolution for Video Streaming

Low-complexity frame importance modelling and resource allocation scheme for error-resilience H.264 video streaming

Rate Optimization for Constant Quality Fine Granularity Scalable Video Streaming

Adaptive Recurrent Frame Prediction with Learnable Motion Vectors.

Efficient Super-Resolution for Compression of Gaming Videos

Frame-Level Video Caching and Transmission Scheduling Via Stochastic Learning

Delay Prediction for Real-Time Video Adaptive Transmisson over TCP.

High Efficiency Live Video Streaming With Frame Dropping

Learning-Based Quality Enhancement for Scalable Coded Video over Packet Lossy Networks

Learning Inter-Frame Information for Space-Time Video Super-Resolution

Higher Quality Live Streaming under Lower Uplink Bandwidth

Enhancement or Super-Resolution: Learning-based Adaptive Video Streaming with Client-Side Video Processing

Improving Quality of Experience by Adaptive Video Streaming with Super-Resolution

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

Towards Optimal Real-time Volumetric Video Streaming: A Rolling Optimization and Deep Reinforcement Learning Based Approach

Interactive $360^{\circ}$ Video Streaming Using FoV-Adaptive Coding with Temporal Prediction

Online Streaming Video Super-Resolution with Convolutional Look-Up Table

Video Super-Resolution Using Multi-Frames Fusion and Perceptual Loss

Video Encoding Enhancement Via Content-Aware Spatial and Temporal Super-Resolution

In-Block Prediction-Based Mixed Lossy and Lossless Reference Frame Recompression for Next-Generation Video Encoding

A Unified Model Fusing Region of Interest Detection and Super Resolution for Video Compression