A QUOTA-DDPG Controller for Run-to-Run Control

Zhu Ma,Tianhong Pan
DOI: https://doi.org/10.1109/cac53003.2021.9728433
2021-01-01
Abstract:A novel distributional reinforcement learning controller, i.e., the quantile option structure-based deep deterministic policy gradient (QUOTA-DDPG), is proposed for run-to-run (R2R) control in the chemical mechanical polishing (CMP) process. The algorithm provides a new policy dimension for exploration, which learns a higher-level strategy via QUOTA to adaptively select the appropriate quantile to be used for action selection during environmental interactions. A reward function similar is designed to guide the learning process of an agent-based on the connection between reinforcement learning and feedback control. Simulation experiments demonstrate that the proposed controller can effectively compensate for the abnormal variation of the CMP environment adaptively.
What problem does this paper attempt to address?