PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

Mandan Naresh,Paresh Saxena,Manik Gupta

DOI: https://doi.org/10.48550/arXiv.2305.08114

2023-05-14

Abstract:Providing a high Quality of Experience (QoE) for video streaming in 5G and beyond 5G (B5G) networks is challenging due to the dynamic nature of the underlying network conditions. Several Adaptive Bit Rate (ABR) algorithms have been developed to improve QoE, but most of them are designed based on fixed rules and unsuitable for a wide range of network conditions. Recently, Deep Reinforcement Learning (DRL) based Asynchronous Advantage Actor-Critic (A3C) methods have recently demonstrated promise in their ability to generalise to diverse network conditions, but they still have limitations. One specific issue with A3C methods is the lag between each actor's behavior policy and central learner's target policy. Consequently, suboptimal updates emerge when the behavior and target policies become out of synchronization. In this paper, we address the problems faced by vanilla-A3C by integrating the on-policy-based multi-agent DRL method into the existing video streaming framework. Specifically, we propose a novel system for ABR generation - Proximal Policy Optimization-based DRL for Adaptive Bit Rate streaming (PPO-ABR). Our proposed method improves the overall video QoE by maximizing sample efficiency using a clipped probability ratio between the new and the old policies on multiple epochs of minibatch updates. The experiments on real network traces demonstrate that PPO-ABR outperforms state-of-the-art methods for different QoE variants.

Multimedia

What problem does this paper attempt to address?

This paper aims to address the challenges of providing high - quality video streaming experience (QoE) in 5G and higher - level networks. Specifically, the paper points out that most traditional Adaptive Bit Rate (ABR) algorithms are designed based on fixed rules and are difficult to adapt to widely varying network conditions. Moreover, although recent Deep Reinforcement Learning (DRL) - based methods such as the Asynchronous Advantage Actor - Critic (A3C) method have shown potential in dealing with variable network conditions, these methods still have some limitations, especially the lag problem between the behavior policy and the target policy of the central learner, which can lead to sub - optimal updates when the two are out of sync. To solve these problems, the paper proposes a new system - Proximal Policy Optimization - based Deep Reinforcement Learning Adaptive Bit Rate Generation (PPO - ABR). This system improves sample efficiency in multiple mini - batch update cycles by using clipped probability ratios to limit the differences between new and old policy parameters. Experimental results show that PPO - ABR outperforms existing state - of - the - art methods on real - network traces and can effectively improve the overall QoE of video streaming. The main contributions of the paper are: - Proposing PPO - ABR, an improved DRL method for optimizing the ABR of video streaming. - Solving the problem of asynchronization between the behavior policy and the target policy in the A3C method by clipping the probability ratio. - Experimentally verifying the superior performance of PPO - ABR under different QoE metrics, especially when dealing with rapidly changing network conditions.

PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery

Deep-Reinforcement-Learning-based User-Preference-Aware Rate Adaptation for Video Streaming

Deep Reinforcement Learning with Importance Weighted A3C for QoE enhancement in Video Delivery Services

Adaptive Video Streaming Based on Learning Intrinsic Reward

Learning Tailored Adaptive Bitrate Algorithms to Heterogeneous Network Conditions: A Domain-Specific Priors and Meta-Reinforcement Learning Approach

Joint QoS Control and Bitrate Selection for Video Streaming Based on Multi-agent Reinforcement Learning

Adaptive Bitrate Streaming in Wireless Networks With Transcoding at Network Edge Using Deep Reinforcement Learning

Latency Aware Adaptive Video Streaming Using Ensemble Deep Reinforcement Learning.

Learning-Based Online QoE Optimization in Multi-Agent Video Streaming

RAV: Learning-Based Adaptive Streaming to Coordinate the Audio and Video Bitrate Selections

Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming

Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework

COCKTAIL: Video streaming QoE optimization with chunk replacement and guided learning

Throughput Prediction-Enhanced RL for Low-Delay Video Application.

VASE: Enhancing Adaptive Bitrate Selection for VBR-Encoded Audio and Video Content with Deep Reinforcement Learning

Improving Generalization for Neural Adaptive Video Streaming Via Meta Reinforcement Learning

Optimizing Video Streaming in Dynamic Networks: An Intelligent Adaptive Bitrate Solution Considering Scene Intricacy and Data Budget

Queue-Learning-Based QoE Optimization for Super-Resolution-Assisted Adaptive Video Streaming.

360HRL: Hierarchical Reinforcement Learning Based Rate Adaptation for 360-Degree Video Streaming

MetaABR: A Meta-Learning Approach on Adaptative Bitrate Selection for Video Streaming