Reinforcement Learning for Dynamic Resource Optimization in 5G Radio Access Network Slicing

Yi Shi,Yalin E. Sagduyu,Tugba Erpek
DOI: https://doi.org/10.48550/arXiv.2009.06579
2020-09-15
Abstract:The paper presents a reinforcement learning solution to dynamic resource allocation for 5G radio access network slicing. Available communication resources (frequency-time blocks and transmit powers) and computational resources (processor usage) are allocated to stochastic arrivals of network slice requests. Each request arrives with priority (weight), throughput, computational resource, and latency (deadline) requirements, and if feasible, it is served with available communication and computational resources allocated over its requested duration. As each decision of resource allocation makes some of the resources temporarily unavailable for future, the myopic solution that can optimize only the current resource allocation becomes ineffective for network slicing. Therefore, a Q-learning solution is presented to maximize the network utility in terms of the total weight of granted network slicing requests over a time horizon subject to communication and computational constraints. Results show that reinforcement learning provides major improvements in the 5G network utility relative to myopic, random, and first come first served solutions. While reinforcement learning sustains scalable performance as the number of served users increases, it can also be effectively used to assign resources to network slices when 5G needs to share the spectrum with incumbent users that may dynamically occupy some of the frequency-time blocks.
Networking and Internet Architecture,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the dynamic resource allocation in 5G radio access network (RAN) slicing. Specifically, the paper focuses on how to optimize the allocation of communication resources (such as frequency - band time blocks and transmit power) and computing resources (such as processor utilization) while meeting the priority, throughput, computing resources and latency requirements of different network slice requests. ### Problem Background With the development of 5G technology, traditional static resource allocation methods can no longer meet the increasing user demands and diverse quality - of - service (QoS) requirements. 5G introduces the RAN slicing function, allowing multiple virtual networks to share physical infrastructure. However, how to effectively allocate these resources in a dynamic environment to maximize network utility and meet the needs of various applications remains a challenge. ### Specific Problem Description 1. **Dynamic Resource Allocation**: When each network slice request arrives, resource allocation needs to be carried out according to its priority, throughput, computing resources and latency requirements. Since the availability of resources changes over time, and the resource allocation decision at the current moment will affect the resource availability in the future, it is ineffective to only consider the optimal solution at the current moment (i.e., the myopic solution). 2. **Multi - objective Optimization**: Trade - offs need to be made among multiple objectives, such as maximizing the sum of weights of satisfied requests while ensuring the effective use of resources. 3. **Spectrum Sharing**: The 5G system may need to coexist with existing spectrum users (such as radar systems). When these existing users occupy certain frequency - band time blocks, the 5G system needs to dynamically adjust the resource allocation strategy to avoid interference and maximize the utilization of the remaining spectrum resources. ### Solution To solve the above problems, the paper proposes a Q - learning algorithm based on reinforcement learning (RL) for dynamic resource allocation. This algorithm predicts future resource requirements by learning historical data and makes optimal resource allocation decisions according to the current state and reward function. Compared with myopic algorithms, random algorithms and first - come - first - served (FCFS) algorithms, Q - learning can optimize resource allocation over a longer time range, thereby significantly improving network utility. ### Main Contributions - Proposed a Q - learning - based dynamic resource allocation scheme, which solves the problem that myopic solutions cannot effectively optimize long - term resource allocation. - Considered the coexistence problem between the 5G system and existing users in the spectrum - sharing scenario, and demonstrated the adaptability and effectiveness of Q - learning in such a dynamic environment. - Verified the performance advantages of Q - learning in different scenarios through simulation, especially in the case of an increase in the number of users and changes in resource requirements. In summary, this paper aims to solve the complex dynamic resource allocation problem in 5G RAN slicing by introducing reinforcement learning technology and improve the overall performance and resource utilization of the network.