Abstract:Video frame transmission delay is critical in real-time applications such as online video gaming, live show, etc. The receiving deadline of a new frame must catch up with the frame rendering time. Otherwise, the system will buffer a while, and the user will encounter a frozen screen, resulting in unsatisfactory user experiences. An effective approach is to transmit frames in lower-quality under poor bandwidth conditions, such as using scalable video coding. In this paper, we propose to enhance video quality using lossy frames in two situations. First, when current frames are too late to receive before rendering deadline (i.e., lost), we propose to use previously received high-resolution images to predict the future frames. Second, when the quality of the currently received frames is low~(i.e., lossy), we propose to use previously received high-resolution frames to enhance the low-quality current ones. For the first case, we propose a small yet effective video frame prediction network. For the second case, we improve the video prediction network to a video enhancement network to associate current frames as well as previous frames to restore high-quality images. Extensive experimental results demonstrate that our method performs favorably against state-of-the-art algorithms in the lossy video streaming environment.
What problem does this paper attempt to address?
This paper attempts to solve the problems of frame loss or frame quality degradation in video stream transmission due to network bandwidth limitations, especially in real - time applications (such as online video games, live broadcasts, etc.). Specifically, the paper focuses on two main issues:
1. **When the current frame fails to be received before the rendering deadline (i.e., frame loss)**: At this time, the system will encounter buffer interruption, and the user will see the screen freeze, resulting in an unsatisfactory user experience. To this end, the author proposes to use previously received high - resolution images to predict future frames.
2. **When the quality of the currently received frame is low (i.e., low - resolution or lossy compression)**: In this case, although the frame can arrive on time, due to the too - low resolution or excessive compression loss, the quality of the frame is poor. The author proposes to use previously received high - resolution frames to enhance the current low - quality frame.
To solve the above problems, the author proposes a model named Prediction - ASSistant Network (PASS - Net). This model combines techniques such as optical flow estimation, optical flow propagation, and optical flow fusion, and uses historical high - resolution frames to assist in predicting or restoring the current frame, thereby improving the quality of the video stream. Specifically, PASS - Net works in two cases:
- **Frame loss case**: Use previously received high - resolution frames to predict future frames.
- **Frame - poor - quality case**: Use previously received high - resolution frames to enhance the current low - quality frame.
Through this method, PASS - Net can effectively improve the quality of video streams under low - bandwidth conditions, reduce the possibility of buffer interruption and frame loss, and thus enhance the user experience.
### Main contributions
1. **Propose a video frame super - resolution method based on prediction assistance**: This method explicitly uses previous frames to help predict or restore the current frame, which is helpful for accurately estimating optical flow and providing rich texture details.
2. **Introduce three optical flow fusion mechanisms**: including estimated optical flow, propagated optical flow, and wrapped optical flow, to improve the accuracy of future optical flow estimation, especially in cases where the current optical flow is lost or the current frame has a low resolution.
3. **Show that the proposed model is more effective, efficient, and compact than existing methods**: This has been verified through experiments on multiple benchmark datasets.
### Formula summary
- Optical flow propagation formula:
\[
\tilde{F}_{0 \to t}=0.5t(t + 1)F_{0 \to - 2}-t(t + 2)F_{0 \to - 1}
\]
- Optical flow estimation formula:
\[
\bar{F}_{0 \to t}=E(I_{-n},\dots,I_0;I_t^{\downarrow s})
\]
- Optical flow fusion formula:
\[
\hat{F}_{0 \to t}=F(\tilde{F}_{0 \to t},\hat{F}_{0 \to t},\bar{F}_{0 \to t},I_0)
\]
- Frame synthesis formula:
\[
I_t=\varphi(W(I_0;F_{t \to 0}),\text{bic}(I_t^{\downarrow s}),F_{t \to 0};\Theta)
\]
These formulas ensure the robustness and accuracy of the model in different scenarios.