A novel data-driven controller for solid oxide fuel cell via deep reinforcement learning

Jiawen Li,Tao Yu
DOI: https://doi.org/10.1016/j.jclepro.2021.128929
IF: 11.1
2021-10-01
Journal of Cleaner Production
Abstract:Solid oxide fuel cells (SOFC) are complex nonlinear and time-varying systems with operational constraints. How to effectively control and stabilize the output voltage while preventing constraint violations becomes a main challenge to its wide application. To improve the efficiency of its operation and power tracking control and prevent constraint violations, this paper designs a data-driven adaptive proportional integral derivative (PID) controller, which maintains the output voltage at the reference value via the optimal control of the hydrogen flow. Moreover, a novel large-scale deep reinforcement learning (DRL) algorithm, called the two-stage training strategy large-scale twin delayed deep determination policy gradient (TGSL-TD3PG), is adopted to adaptively adjust the baseline coefficients of the designed controller scaffolded by the high adaptability and model-free features of reinforcement learning. In the training of TGSL-TD3PG, multiple agents are simultaneously employed to acquire the best policy, whereby the training of optimal agents in practice is structured upon the principles of imitation learning and curriculum learning. This method solves the common problem of low robustness in conventional deep reinforcement learning and can be applied to the field of control. Moreover, the TGSL-TD3PG algorithm incorporates the baseline PID coefficients into the design objective and offers the controller with the online coefficient-adjusting ability through learning. Therefore, such limitations as no adaptability, low robustness and neglect of constraints associated with conventional PID control are significantly addressed. Simulation results show that the proposed controller can promote the SOFC to track the load power demand effectively while maintaining the fuel utilization ratio constant. The proposed algorithm reduces 45.2% of the output voltage setting time and 30% of the voltage overshoot to the maximum extent, while generating the constraint violation time of fuel utilization at 0.
environmental sciences,green & sustainable science & technology,engineering, environmental
What problem does this paper attempt to address?