Adaptive Optimal Control of Discrete-Time Linear Systems with Discounted Value: Off-Policy Reinforcement Learning

Liao Zhu,Chunxiuzi Liu,Jingsheng Xu,Ping Guo
DOI: https://doi.org/10.1109/DOCS60977.2023.10294994
2023-01-01
Abstract:This manuscript is engaged in the intricacies of optimizing the control conundrums inherent to enigmatic, unknown discrete-time linear systems, underscored by an intractable discounted infinite-horizon value function. To forge a data-driven conductor, we unveil the discounted algebraic Riccati equation (D-ARE) and cast an analytical gaze upon the on-policy reinforcement learning (RL) algorithm. From this springboard, we erect two computational scaffoldsnamely, the model-based and model-free off-policy RL algorithmsto surmount the D-ARE impasse. Notably, the congruence between off-policy RL algorithms and their on-policy kin emerges resplendent. The model-free off-policy RL algorithm, akin to a discerning connoisseur, imbibes the essence of D-ARE solution from the subtle ballet of system state trajectory data, eschewing the need for the ostentation of system dynamics. Subsequently, the hallowed crucible of simulation, enlivened by the direct current servo motor, buttresses the efficacy of our proposed algorithm while painting a vivid tableau of the discount factor's symphonic influence upon the resultant conductor.
What problem does this paper attempt to address?