Dancing with Shackles, Meet the Challenge of Industrial Adaptive Streaming Via Offline Reinforcement Learning

Lianchen Ha,Chao Zhou,Tianchi Huang,Chaoyang Li,Lifeng Sun
DOI: https://doi.org/10.1109/infocom52122.2024.10621126
2024-01-01
Abstract:Adaptive video streaming has been studied for over 10 years and has demonstrated remarkable performance. However, adaptive video streaming is not an independent algorithm but relies on other components of the video system. Consequently, as other components undergo optimization, the gap between the traditional simulator and the real-world system continues to grow which makes the adaptive video streaming algorithm must adapt to these variations.In order to address the challenges facing industrial adaptive video streaming, we introduce a novel offline reinforcement learning framework called Backwave. This framework leverages history logs to reduce the sim-real gap. We propose new metrics based on counterfactual reasoning to evaluate its performance and we integrate expert knowledge to generate valuable data to mitigate the issue of data override. Furthermore, we employ curriculum learning to minimize additional errors.We deployed Backwave on a mainstream commercial short video platform, Kuaishou. In a series of A/B tests conducted nearly one month with over 400M daily watch times, Backwave consistently outperforms prior algorithms. Specifically, Backwave reduces stall time by 0.45% to 8.52% while maintaining comparable video quality and Backwave demonstrates improvements in average play duration by 0.12% to 0.16%, and overall play duration by 0.12% to 0.26%.
What problem does this paper attempt to address?