Improving ABR Performance for Short Video Streaming Using Multi-Agent Reinforcement Learning with Expert Guidance

Yueheng Li,Qianyuan Zheng,Zicheng Zhang,Hao Chen,Zhan Ma
2023-04-10
Abstract:In the realm of short video streaming, popular adaptive bitrate (ABR) algorithms developed for classical long video applications suffer from catastrophic failures because they are tuned to solely adapt bitrates. Instead, short video adaptive bitrate (SABR) algorithms have to properly determine which video at which bitrate level together for content prefetching, without sacrificing the users' quality of experience (QoE) and yielding noticeable bandwidth wastage jointly. Unfortunately, existing SABR methods are inevitably entangled with slow convergence and poor generalization. Thus, in this paper, we propose Incendio, a novel SABR framework that applies Multi-Agent Reinforcement Learning (MARL) with Expert Guidance to separate the decision of video ID and video bitrate in respective buffer management and bitrate adaptation agents to maximize the system-level utilized score modeled as a compound function of QoE and bandwidth wastage metrics. To train Incendio, it is first initialized by imitating the hand-crafted expert rules and then fine-tuned through the use of MARL. Results from extensive experiments indicate that Incendio outperforms the current state-of-the-art SABR algorithm with a 53.2% improvement measured by the utility score while maintaining low training complexity and inference time.
Multimedia,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimize the Adaptive Bit Rate (ABR) algorithm in short - video streaming media to simultaneously improve the Quality of Experience (QoE) of users and bandwidth efficiency. Specifically, the existing ABR algorithms are mainly designed for traditional long - video applications, and these algorithms have deficiencies when dealing with short - videos. For example, they cannot effectively perform content pre - fetching, which easily leads to a decline in user experience or bandwidth waste. Therefore, the paper proposes a new framework - Incendio, which uses the method of combining Multi - Agent Reinforcement Learning (MARL) with expert guidance to optimize the Short - video Adaptive Bit Rate (SABR) algorithm. The main contributions of the paper are as follows: 1. **Separate decision - making**: The selection of video IDs and the adjustment of video bit rates are respectively handed over to two different agents (BM - agent and BA - agent) for management. This can reduce the decision - making space and accelerate the convergence speed of neural network training. 2. **Utilize expert knowledge**: Incendio is pre - trained through Imitation Learning (IL) to quickly reach the expert state from a basic state, reducing the number of invalid trials and the risk of sub - optimal solutions. 3. **Efficient reinforcement learning**: The Multi - Agent Proximal Policy Optimization (MAPPO) algorithm is used to fine - tune Incendio to further optimize its performance. The experimental results show that compared with the current state - of - the - art SABR algorithms, Incendio has increased the overall utility score by 53.2%, and also maintains a low level in terms of training complexity and inference time. This indicates that Incendio is not only superior to existing methods in performance, but also has high feasibility in actual deployment.