Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services

Tianchi Huang,Rui-Xiao Zhang,Lifeng Sun
DOI: https://doi.org/10.1109/tmm.2021.3063620
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Video transmission services adopt adaptive algorithms to ensure users’ demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose Zwei , a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose Zwei$^+$ , which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei $^+$ in the adaptive bitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.
What problem does this paper attempt to address?