Shuvangkar Chandra Das,Tuyen Vu,Deepak Ramasubramanian,Evangelos Farantatos,Jianhua Zhang,Thomas Ortmeyer
Abstract:This paper presents novel methods for tuning inverter controller gains using deep reinforcement learning (DRL). A Simulink-developed inverter model is converted into a dynamic link library (DLL) and integrated with a Python-based RL environment, leveraging the multi-core deployment and accelerated computing to significantly reduce RL training time. A neural network-based mechanism is developed to transform the cascaded PI controller into an actor network, allowing optimized gain tuning by an RL agent to mitigate scenarios such as subsynchronous oscillations (SSO) and initial transients. Two distinct tuning approaches are demonstrated: a fixed gain strategy, where controller gains are represented as RL policy (actor network) weights, and an adaptive gain strategy, where gains are dynamically generated as RL policy (actor network) outputs. A comparative analysis of these methods is provided, showcasing their effectiveness in stabilizing the transient performance of grid-forming and grid-following converters and deployment challenges in hardware. Experimental results are presented, demonstrating the enhanced robustness and practical applicability of the RL-tuned controller gains in real-world systems.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the optimization of inverter controller gain tuning in power systems. Specifically, the authors propose using the deep reinforcement learning (DRL) method to adjust the gains of inverter controllers in order to improve the stability of power systems. The paper mainly focuses on two aspects:
1. **Fixed - gain strategy**: In this strategy, the controller gains are represented as the weights of the reinforcement learning (RL) strategy (i.e., the actor network). In this way, the gain values can be optimized, thereby reducing sub - synchronous oscillations (SSO) and initial transient phenomena in power systems.
2. **Adaptive - gain strategy**: In this strategy, the gain values are dynamically generated by the RL strategy (i.e., the actor network). This method allows the controller to adjust the gain according to real - time situations, further improving the system's stability and response speed.
To achieve these goals, the authors developed a new mechanism to convert the cascaded PI controller into an actor network and trained it through deep reinforcement learning algorithms such as PPO. In addition, they also proposed a hybrid method to convert the electromagnetic transient (EMT) model developed by Simulink into a dynamic - link library (DLL) and integrate it with the Python - based RL environment, using multi - core deployment and accelerated computing to significantly reduce the RL training time.
### Main contributions of the paper:
1. **New neural network mechanism**: Developed a neural - network - based mechanism to convert the inverter cascaded PI controller into an actor network, enabling RL agents to optimize gains, especially when dealing with scenarios such as transients and sub - synchronous oscillations.
2. **Two tuning methods**: Proposed two methods, fixed - gain and adaptive - gain, and conducted a comparative analysis, demonstrating their effectiveness in stabilizing grid formation and following converter transient performance.
3. **Efficient training pipeline**: Proposed a pipeline for IBR controller tuning, in which the inverter model developed in the EMT simulation platform (such as Simulink) is converted into a DLL and integrated with the reinforcement learning environment in the programming environment (such as Python), using multi - core deployment and accelerated computing to optimize the model and significantly reduce the training time.
4. **Experimental results**: Demonstrated the improved performance of RL - tuned controller gains, emphasizing its practical applications and impacts.
### Formula examples:
- **Objective function of the PPO algorithm**:
\[
L_{\text{CLIP}}(\theta)=\mathbb{E}_t\left[\min\left(r_t(\theta)\hat{A}_t,\text{clip}(r_t(\theta), 1 - \epsilon, 1+\epsilon)\hat{A}_t\right)\right]
\]
where \(r_t(\theta)=\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)}\), \(\hat{A}_t\) is the advantage estimate, and \(\epsilon\) is a hyperparameter, usually set to 0.2.
- **Reward function**:
\[
R_t = Q_1\times(i_{Ld}^{\text{ref}}-i_{Ld})^2+Q_2\times(i_{Lq}^{\text{ref}}-i_{Lq})^2+Q_3\times a_1^2+Q_4\times a_2^2+Q_5\times\sum_{i = 1}^2|a_i-\text{LPF}(a_i)|+\begin{cases}
Q_6\times|P|&\text{if }P < 0\\
0&\text{otherwise}
\end{cases}
\]
These formulas and methods together form the core content of the paper, aiming to optimize the gains of inverter controllers through deep reinforcement learning techniques, thereby improving the stability and performance of power systems.