Abstract:In adaptive video streaming, the design of an adaptive bitrate (ABR) strategy is critical for the quality-of-experience (QoE) perceived by users. Though current learning-based ABR algorithms achieve state-of-the-art performance for users with a given QoE metric setting for training, they may unfortunately suffer the poor generalization issue for other users with different QoE preferences. Besides, how to quantitatively characterize the distinct QoE preference for a user has also not been extensively studied yet. In this paper, we propose STEER, a successor feature-based transfer reinforcement learning framework for fast learning the ABR strategies on heterogeneous QoE preferences. Specifically, we first develop a QoE preference analysis scheme to infer the personal QoE preference of a single user based on the user's actual viewing history. We then formulate the personalized QoE maximization problem as a reinforcement learning (RL) task, which optimizes the ABR strategy to maximize the overall QoE perceived by the user. Further, we model the QoE maximization problem for multiple users with heterogeneous QoE preferences as a multi-task RL problem, with each task distinguished by the user-distinct QoE preference. To efficiently address this problem, the proposed STEER solves for each RL-based ABR task by learning its optimal successor feature (SF) function, which can be exploited as shared knowledge across tasks to facilitate the transfer between tasks. With SF functions, STEER can quickly evaluate the optimal policies of previously learned tasks on a new task, and further use the generalized policy improvement operation to obtain a jumpstart policy. Both theoretically and empirically, we show that this jumpstart policy is a good initial policy with a performance guarantee for better generalization in the new task, and can also lead to a faster convergence to the optimal policy of the new task.

Adaptive Video Streaming Based on Learning Intrinsic Reward

Learning Tailored Adaptive Bitrate Algorithms to Heterogeneous Network Conditions: A Domain-Specific Priors and Meta-Reinforcement Learning Approach

Deep-Reinforcement-Learning-based User-Preference-Aware Rate Adaptation for Video Streaming

Adaptive video streaming algorithm based on meta-learning

Latency Aware Adaptive Video Streaming Using Ensemble Deep Reinforcement Learning.

Survey on Reinforcement Learning Based Adaptive Bit Rate Algorithm for Mobile Video Streaming Services

Improving Generalization for Neural Adaptive Video Streaming Via Meta Reinforcement Learning

AraLive: Automatic Reward Adaption for Learning-based Live Video Streaming

Neural Adaptive Video Streaming with OfflineReinforcement Learning

Learning Accurate Network Dynamics for Enhanced Adaptive Video Streaming

RAV: Learning-Based Adaptive Streaming to Coordinate the Audio and Video Bitrate Selections

Adaptive Bitrate Streaming in Wireless Networks With Transcoding at Network Edge Using Deep Reinforcement Learning

360HRL: Hierarchical Reinforcement Learning Based Rate Adaptation for 360-Degree Video Streaming

Enhancing Neural Adaptive Wireless Video Streaming via Lower-Layer Information Exposure and Online Tuning

MetaABR: A Meta-Learning Approach on Adaptative Bitrate Selection for Video Streaming

A Learning-Based Approach for Video Streaming over Fluctuating Networks with Limited Playback Buffers

PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

Successor Feature-Based Transfer Reinforcement Learning for Video Rate Adaptation with Heterogeneous QoE Preferences

Throughput Prediction-Enhanced RL for Low-Delay Video Application.

360SRL: A Sequential Reinforcement Learning Approach for ABR Tile-Based 360 Video Streaming.

VASE: Enhancing Adaptive Bitrate Selection for VBR-Encoded Audio and Video Content with Deep Reinforcement Learning