Domain Adaptation of Reinforcement Learning Agents based on Network Service Proximity

Kaushik Dey,Satheesh K. Perepu,Pallab Dasgupta,Abir Das
DOI: https://doi.org/10.1109/NetSoft57336.2023.10175507
2023-03-02
Abstract:The dynamic and evolutionary nature of service requirements in wireless networks has motivated the telecom industry to consider intelligent self-adapting Reinforcement Learning (RL) agents for controlling the growing portfolio of network services. Infusion of many new types of services is anticipated with future adoption of 6G networks, and sometimes these services will be defined by applications that are external to the network. An RL agent trained for managing the needs of a specific service type may not be ideal for managing a different service type without domain adaptation. We provide a simple heuristic for evaluating a measure of proximity between a new service and existing services, and show that the RL agent of the most proximal service rapidly adapts to the new service type through a well defined process of domain adaptation. Our approach enables a trained source policy to adapt to new situations with changed dynamics without retraining a new policy, thereby achieving significant computing and cost-effectiveness. Such domain adaptation techniques may soon provide a foundation for more generalized RL-based service management under the face of rapidly evolving service types.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of how to enable Reinforcement Learning (RL) agents to quickly adapt to new types of services or environmental changes in wireless networks as service requirements change and evolve dynamically. Specifically: 1. **Diversification of service types**: Future 6G networks will introduce more new service types. These services are sometimes defined by external applications and are different from the services inside the existing network. Therefore, an RL agent trained for a specific service type may not be directly applicable to other types of services unless domain adaptation is carried out. 2. **Environmental changes**: Due to changes in the distribution of user equipment (UE), changes in mobility patterns, and handovers between gNodeBs, the underlying radio environment will also change. This makes the existing RL policies need to be frequently retrained, but this is not feasible in terms of computational resources and energy consumption. 3. **Computational efficiency and cost - effectiveness**: To address the above challenges, the paper proposes a domain adaptation method based on network service proximity, enabling the trained source policy to quickly adapt to new service types or environmental changes without retraining, thereby achieving significant computational and cost - effectiveness. ### Core ideas of the solution The paper proposes the following key techniques to solve these problems: - **Domain adaptation method based on Cycle - GANs**: By learning the mapping relationship between states and actions, the policy in the source domain can work effectively in the target domain. This method does not require the reward signal in the target domain, thus reducing the need for new environmental data. - **Service proximity heuristic algorithm**: By calculating the distance between different service types (such as Euclidean distance, Manhattan distance, and KL divergence), the closest target service is selected as the basis for the source policy. This heuristic algorithm can predict the effect of domain adaptation in advance and guide the selection of appropriate source services. - **Experimental verification**: The paper verifies the effectiveness of the proposed method through experiments and shows its advantages in sample efficiency and adaptation speed. ### Formula representation The formulas involved in the description are as follows: - **State space representation**: \[ \text{Observation Space}=\left\langle K, \frac{|K - \text{target}|}{K}\right\rangle \] where \(K\) is the value of the current control variable, and \(\text{target}\) is the target KPI value. - **Action space representation**: \[ \text{Action Space} = [Prio, MBR_1, MBR_2,\cdots, MBR_N] \] where \(Prio\) is the service priority, \(MBR_i\) is the maximum bit rate of the \(i\)-th user equipment, and \(N\) is the number of user equipment within the service. - **Reward function**: \[ \text{Reward}=-|K - \text{target}| \] ### Conclusion By introducing the domain adaptation method based on network service proximity, the paper solves the problem of rapid adaptation of RL agents in wireless networks when facing new service types and environmental changes. The experimental results show that this method has significant advantages in sample efficiency and adaptation speed, and does not require retraining new policies, thereby achieving an improvement in computational and cost - effectiveness.