Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning

Zhengming Zhang,Yaru Zheng,Meng Hua,Yongming Huang,Luxi Yang
DOI: https://doi.org/10.48550/arXiv.1803.11334
2018-03-30
Abstract:Caching and rate allocation are two promising approaches to support video streaming over wireless network. However, existing rate allocation designs do not fully exploit the advantages of the two approaches. This paper investigates the problem of cache-enabled QoE-driven video rate allocation problem. We establish a mathematical model for this problem, and point out that it is difficult to solve the problem with traditional dynamic programming. Then we propose a deep reinforcement learning approaches to solve it. First, we model the problem as a Markov decision problem. Then we present a deep Q-learning algorithm with a special knowledge transfer process to find out effective allocation policy. Finally, numerical results are given to demonstrate that the proposed solution can effectively maintain high-quality user experience of mobile user moving among small cells. We also investigate the impact of configuration of critical parameters on the performance of our algorithm.
Networking and Internet Architecture,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimize the Quality of Experience (QoE) of video streaming through dynamic bit - rate allocation in cache - enabled wireless networks. Specifically, the paper focuses on how to effectively maintain a high - quality user experience when mobile users move between small cells in the network. This involves several key factors, including video quality (quantified by video bit - rate), packet loss, and video freeze duration. ### Problem Background With the exponential growth in the number of smart mobile devices and the emergence of innovative high - data - rate mobile services (such as video streaming, mobile games, and traffic condition monitoring), networks need to cope with the increasing demand for wireless traffic. Especially in video streaming, it is expected that by the end of 2018, mobile video traffic will account for more than one - third of mobile data traffic. Users' demands for video loading speed and clarity are constantly rising. Therefore, the Quality of Experience (QoE) of viewers has become an important indicator for measuring the performance of mobile communication networks. However, mobile users often encounter problems such as video delay or sudden blurring when using mobile devices to watch videos. These problems are mainly due to the fact that the video is divided into small video segments, and when the network quality is low, the sender will reduce the video resolution in the following few seconds to ensure that the user can continue to watch the video, resulting in the inability to guarantee video quality. ### Research Objectives The main objective of the paper is to propose a method based on deep reinforcement learning to solve the QoE - driven video bit - rate allocation problem in cache - enabled networks. Specifically, the authors establish a mathematical model to describe this problem and point out that the traditional dynamic programming method is difficult to solve this problem. For this reason, they propose a deep Q - learning algorithm to find an effective allocation strategy through a special knowledge transfer process. This method aims to maximize the bit - rate of actually consumed video blocks while minimizing packet loss and video freeze duration. ### Main Contributions 1. **Defined a more realistic QoE standard**: Considered the total cumulative bit - rate, packet loss, and video freeze time. 2. **Transformed the allocation problem into a Markov decision problem**: Extended the network capacity and candidate bit - rates from discrete spaces to continuous spaces and pointed out that the state - transition probability cannot be directly obtained. 3. **Proposed a deep self - transfer reinforcement learning method**: Obtained (sub - ) optimal dynamic video bit - rate allocation strategies through this method. ### Method Overview The paper describes the problem by constructing a Markov decision process (MDP) and proposes an algorithm that combines continuous deep reinforcement learning, imitation learning, and transfer learning. The combination of these techniques enables the algorithm to effectively handle problems in continuous state and action spaces, thus having higher flexibility and robustness in practical applications. ### Conclusion The paper shows through numerical results that the proposed solution can effectively maintain a high - quality user experience when mobile users move between small cells. In addition, the influence of key parameter configurations on the performance of the algorithm has also been studied.