Interactions of salts and denaturing agents with a polyacrylamide gel.

T. S. Pierre,W. Jencks

DOI: https://doi.org/10.1016/0003-9861(69)90491-3

IF: 4.114

1969-08-01

Archives of Biochemistry and Biophysics

Abstract:

What problem does this paper attempt to address?

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

Mohammad Alsalti,Victor G. Lopez,Matthias A. Müller

2024-08-20

Abstract:In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.

Systems and Control
A Learning-Based Optimal Tracking Controller for Continuous Linear Systems with Unknown Dynamics: Theory and Case Study

Jingren Zhang,Qingfeng Wang,Tao Wang

DOI: https://doi.org/10.1177/0020294020915213

2020-01-01

Measurement and Control

Abstract:In this article, a novel continuous-time optimal tracking controller is proposed for the single-input-single-output linear system with completely unknown dynamics. Unlike those existing solutions to the optimal tracking control problem, the proposed controller introduces an integral compensation to reduce the steady-state error and regulates the feedforward part simultaneously with the feedback part. An augmented system composed of the integral compensation, error dynamics, and desired trajectory is established to formulate the optimal tracking control problem. The input energy and tracking error of the optimal controller are minimized according to the objective function in the infinite horizon. With the application of reinforcement learning techniques, the proposed controller does not require any prior knowledge of the system drift or input dynamics. The integral reinforcement learning method is employed to approximate the Q-function and update the critic network on-line. And the actor network is updated with the deterministic learning method. The Lyapunov stability is proved under the persistence of excitation condition. A case study on a hydraulic loading system has shown the effectiveness of the proposed controller by simulation and experiment.
The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach

Xunde Dong,Yuxin Lin,Xudong Suo,Xihao Wang,Weijie Sun

DOI: https://doi.org/10.3390/math12040509

IF: 2.4

2024-02-07

Mathematics

Abstract:This paper investigates the output feedback (OPFB) tracking control problem for discrete-time linear (DTL) systems with unknown dynamics. To solve this problem, we use an augmented system approach, which first transforms the tracking control problem into a regulation problem with a discounted performance function. The solution to this problem is derived using a Bellman equation, based on the Q-function. In order to overcome the challenges of unmeasurable system state variables, we employ a multistep Q-learning algorithm that surpasses the advantages of the policy iteration (PI) and value iteration (VI) techniques and state reconstruction methods for output feedback control. As such, the requirement for an initial stabilizing control policy for the PI method is removed and the convergence speed of the learning algorithm is improved. Finally, we demonstrate the effectiveness of the proposed scheme using a simulation example.

mathematics
Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Lingzhi Zhang,Lei Xie,Yi Jiang,Zhishan Li,Xueqin Liu,Hongye Su

DOI: https://doi.org/10.1109/tnnls.2023.3326397

IF: 11.8

2024-01-01

IEEE Transactions on Cybernetics

Abstract:The state and input constraints of nonlinear systems could greatly impede the realization of their optimal control when using reinforcement learning (RL)-based approaches since the commonly used quadratic utility functions cannot meet the requirements of solving constrained optimization problems. This article develops a novel optimal control approach for constrained discrete-time (DT) nonlinear systems based on safe RL. Specifically, a barrier function (BF) is introduced and incorporated with the value function to help transform a constrained optimization problem into an unconstrained one. Meanwhile, the minimum of such an optimization problem can be guaranteed to occur at the origin. Then a constrained policy iteration (PI) algorithm is developed to realize the optimal control of the nonlinear system and to enable the state and input constraints to be satisfied. The constrained optimal control policy and its corresponding value function are derived through the implementation of two neural networks (NNs). Performance analysis shows that the proposed control approach still retains the convergence and optimality properties of the traditional PI algorithm. Simulation results of three examples reveal its effectiveness.
Output Feedback H∞ Control for Discrete Time Singularly Perturbed Systems with Markov Lossy Network: the Round-Robin-like Protocol Case

Yue Hu,Oh-Min Kwon,Chenxiao Cai,Yeong-Jae Kim

DOI: https://doi.org/10.1016/j.amc.2023.128338

IF: 4.397

2024-01-01

Applied Mathematics and Computation

Abstract:In this paper, the output feedback H∞ control is investigated for the networked singularly perturbed systems (SPSs) with Markov lossy network, in which the packet loss probability varies in different modes. A new network mode-dependent round-robin-like protocol (RRLP) is introduced to the networked SPSs, which can dynamically adjust the number of the selected sensor nodes according to the network environment condition. Based on this RRLP and Markov lossy network, a Markov mode and scheduling order co-dependent output feedback controller is constructed to ensure the system control performance. By constructing a novel Lyapunov-Krasovskii functional dependent on the singular perturbation parameter, Markov mode, and scheduling order simultaneously, sufficient conditions of mean square stability for SPSs are obtained, and numerical stiffness is avoided. Finally, a direct current motor-controlled inverted pendulum example is proposed to verify the effectiveness of our main results.
Output Feedback Adaptive Robust Learning Control of a Class of Nonlinear Systems with Periodic Disturbances

Xiangbin Liu,Zhongsheng Hou,Bin Yao,Hongye Su,Mingxuan Sun

DOI: https://doi.org/10.3182/20110828-6-IT-1002.01812

2011-01-01

Abstract:In this paper, a discontinuous projection-based output feedback adaptive robust learning control (OARLC) scheme is constructed for a class of nonlinear systems in a semi-strict feedback form by incorporating an observer and a dynamic normalization signal. Since only output signal is available for measurement, an observer is firstly designed to provide exponentially convergent estimates of the unmeasurable states. Using certain known basis functions to capture the characteristics of unknown general periodic disturbances, the discontinuous projection type adaptation law can then be used to tune the amplitudes of those basis functions on-line to recover the unknown general periodic disturbances asymptotically. The estimation errors due to the unknown initial states, uncompensated disturbances, and the uncertain nonlinearities are also effectively dealt with via certain robust feedback at each step of the proposed OARLC backstepping design. The resulting controller achieves a guaranteed transient and a prescribed final tracking accuracy for output tracking performance. In addition, when the general periodic disturbances fall within the approximation ranges of the periodic basis functions, asymptotic output tracking performance is achieved as well. © 2011 IFAC.
Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems Using Online Approximators

Qinmin Yang,Sarangapani Jagannathan

DOI: https://doi.org/10.1109/tsmcb.2011.2166384

2011-01-01

IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)

Abstract:In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.
Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS with Unidentified Exosystem Dynamics.

Yong Xu,Zheng-Guang Wu

DOI: https://doi.org/10.1109/tnnls.2022.3172130

IF: 14.255

2024-01-01

IEEE Transactions on Neural Networks and Learning Systems

Abstract:In this article, a data-efficient off-policy reinforcement learning (RL) approach is proposed for distributed output tracking control of heterogeneous multiagent systems (HMASs) using approximate dynamic programming (ADP). Different from existing results that the kinematic model of the exosystem is addressable to partial or all agents, the dynamics of the exosystem are assumed to be completely unknown for all agents in this article. To solve this difficulty, an identifiable algorithm using the experience-replay method is designed for each agent to identify the system matrices of the novel reference model instead of the original exosystem. Then, an output-based distributed adaptive output observer is proposed to provide the estimations of the leader, and the proposed observer not only has a low dimension and less data transmission among agents but also is implemented in a fully distributed way. Besides, a data-efficient RL algorithm is given to design the optimal controller offline along with the system trajectories without solving output regulator equations. An ADP approach is developed to iteratively solve game algebraic Riccati equations (GAREs) using online information of state and input in an online way, which relaxes the requirement of knowing prior knowledge of agents' system matrices in an offline way. Finally, a numerical example is provided to verify the effectiveness of theoretical analysis.
Optimal H∞ output feedback control for a class of nonlinear systems

Meiqin Liu,Senlin Zhang,Zhen Fan,Weihua Sheng

2013-01-01

Abstract:This paper discusses optimal H∞ output feedback control using linear matrix inequalities for a class of systems with sector bounded nonlinearities. A unified model, which is the interconnection of a linear dynamic system and a sector bounded static nonlinear operator, is proposed to describe these systems. Based on the H∞ performance analysis of the closed-loop systems including the unified model and output feedback controller, the parameters of output controller are determined not only to guarantee global asymptotic stability of the closed-loop system without disturbances, but also to reduce the effect of external disturbance on the performance output to a minimal H∞ norm constraint. The nonlinear systems satisfying certain sector type constraints can be transformed into this unified model, and H∞ output feedback controllers are designed for these systems in a unified way. © 2013 TCCT, CAA.
Optimal dynamic output feedback control of unknown linear continuous-time systems by adaptive dynamic programming

Kedi Xie,Yiwei Zheng,Yi Jiang,Weiyao Lan,Xiao Yu

DOI: https://doi.org/10.1016/j.automatica.2024.111601

IF: 6.4

2024-03-04

Automatica

Abstract:In this paper, we present an approximate optimal dynamic output feedback control learning algorithm to solve the linear quadratic regulation problem for unknown linear continuous-time systems. First, a dynamic output feedback controller is designed by constructing the internal state. Then, an adaptive dynamic programming based learning algorithm is proposed to estimate the optimal feedback control gain by only accessing the input and output data. By adding a constructed virtual observer error into the iterative learning equation, the proposed learning algorithm with the new iterative learning equation is immune to the observer error. In addition, the value iteration based learning equation is established without storing a series of past data, which could lead to a reduction of demands on the usage of memory storage. Besides, the proposed algorithm eliminates the requirement of repeated finite window integrals, which may reduce the computational load. Moreover, the convergence analysis shows that the estimated control policy converges to the optimal control policy. Finally, a physical experiment on an unmanned quadrotor is given to illustrate the effectiveness of the proposed approach.

automation & control systems,engineering, electrical & electronic
H∞$$ {h}_{\infty } $$ Optimal Output Tracking Control for Markov Jump Systems: A Reinforcement Learning‐based Approach

Ying Shen,Cai-Kang Yao,Bo Chen,Wei-Wei Che,Zheng-Guang Wu

DOI: https://doi.org/10.1002/rnc.7255

IF: 3.8973

2024-01-01

International Journal of Robust and Nonlinear Control

Abstract:In this paper, the H-infinity optimal output tracking control problem for Markov jump systems is investigated, where the two cases with known or completely unknown transition probabilities are both considered. Based on game theory and H-infinity performance, quadratic cost is considered, where a discount parameter is introduced into the quadratic cost in order to track unstable systems and eliminate the assumption that the noise energy is bounded. The game coupled algebraic Riccati equation and the corresponding controller are presented by dynamic programming. The stochastic stability of the tracking error system is further investigated. Moreover, iterative and reinforcement learning-based algorithms are proposed for solving the H-infinity optimal tracking controller with known or completely unknown transition probabilities, respectively. Finally, some numerical simulations on a DC motor are performed to validate the effectiveness of the proposed results.
Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Miao Huang,Cong Liu,Xiaoqi He,Longhua Ma,Zheming Lu,Hongye Su

DOI: https://doi.org/10.1016/j.neucom.2020.03.061

IF: 6

2020-01-01

Neurocomputing

Abstract:In this work, output-feedback control problems for a class of discrete-time non-affine nonlinear systems with unknown control directions and input constraints are considered by using reinforcement learning (RL) method. Two neural networks (NNs) implement the control: 1) a critic NN that estimates a non-quadratic strategic utility function (SUF) and 2) an action NN that generates optimized control input and minimizes the SUF. The implicit function theorem is applied to obtain the optimal control law since the control is appeared in a non-affine form. For the first time, the discrete Nussbaum gain is introduced to overcome the difficulty that the control directions are unknown and a non-quadratic SUF is used to deal with the control constraints in the RL-based control. The theoretical derivation of the uniformly ultimately boundedness of the NN weights and the closed-loop output tracking error is given. And two numerical examples have been supplied to valid the proposed method.
Output-feedback Q-learning for discrete-time linear H-infinity tracking control: A Stackelberg game approach

Yunxiao Ren,Qishao Wang,Zhisheng Duan

DOI: https://doi.org/10.1002/rnc.6169

IF: 3.8973

2022-01-01

International Journal of Robust and Nonlinear Control

Abstract:In this article, an output-feedback Q-learning algorithm is proposed for the discrete-time linear system to deal with the H-infinity tracking control problem. The problem is formulated as a zero-sum game in the Stackelberg game framework with a discount factor to make the value function bounded. According to the principle of optimality, the game algebraic Riccati equation (GARE) is derived and solved by the Q-learning algorithm to get the optimal solution of the Stackelberg game without requiring the knowledge of system dynamics and state. It is proved that the solution of the algorithm converges to the optimal control input and the worst-case disturbance with excitation noises during training, and the Stackelberg strategy can achieve a lower L-2 disturbance attenuation level than the Nash one. Moreover, the impacts of the discount factor on the stability of the closed-loop system and solvability of the GARE are analyzed to provide some criteria for the choice of the discount factor. Simulation examples are provided to validate the effectiveness of the algorithm.
Output‐feedback Q‐learning for discrete‐time linear <i>H</i><sup>∞</sup> tracking control: A Stackelberg game approach

Yunxiao Ren,Qishao Wang,Zhisheng Duan

DOI: https://doi.org/10.1002/rnc.6169

IF: 3.8973

2022-01-01

International Journal of Robust and Nonlinear Control

Abstract:AbstractIn this article, an output‐feedback Q‐learning algorithm is proposed for the discrete‐time linear system to deal with the tracking control problem. The problem is formulated as a zero‐sum game in the Stackelberg game framework with a discount factor to make the value function bounded. According to the principle of optimality, the game algebraic Riccati equation (GARE) is derived and solved by the Q‐learning algorithm to get the optimal solution of the Stackelberg game without requiring the knowledge of system dynamics and state. It is proved that the solution of the algorithm converges to the optimal control input and the worst‐case disturbance with excitation noises during training, and the Stackelberg strategy can achieve a lower disturbance attenuation level than the Nash one. Moreover, the impacts of the discount factor on the stability of the closed‐loop system and solvability of the GARE are analyzed to provide some criteria for the choice of the discount factor. Simulation examples are provided to validate the effectiveness of the algorithm.
Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-Time Systems

Qinmin Yang,Sarangapani Jagannathan

DOI: https://doi.org/10.1109/isic.2007.4450950

2007-01-01

Abstract:In this paper, a novel online reinforcement learning neural network (NN)-based optimal output feedback controller, referred to as adaptive critic controller, is proposed for affine nonlinear discrete-time systems, to deliver a desired tracking performance. The adaptive critic design consist of three entities, an observer to estimate the system states, an action network that produces optimal control input and a critic that evaluates the performance of the action network. The critic is termed adaptive as it adapts itself to output the optimal cost-to-go function which is based on the standard Bellman equation. By using the Lyapunov approach, the uniformly ultimate boundedness (UUB) of the estimation and tracking errors and weight estimates is demonstrated. The effectiveness of the controller is evaluated for the task of nanomanipulation in a simulation environment.
Novel two-dimensional off-policy Q -learning method for output feedback optimal tracking control of batch process with unknown dynamics

Huiyuan Shi,Chen Yang,Xueying Jiang,Chengli Su,Ping Li

DOI: https://doi.org/10.1016/j.jprocont.2022.03.006

IF: 3.951

2022-05-01

Journal of Process Control

Abstract:Reinforcement learning (RL) is an artificial intelligence algorithm that can learn adaptive optimal control law online. In view of the fact that the previous control approaches were usually overly dependent on the model parameters of system, and most existing RL methods are based on state feedback, their application in actual industrial production is limited. Additionally, developing accurate process system models and ensuring the closed-loop system’s control performance is more challenging, as modern businesses place a premium on product quality and economic efficiency. As a result, this work introduces a novel data-driven two-dimensional (2D) off-policy Q -learning method based on output feedback is used to achieve optimal tracking control for batch process. First, the error between the actual output and the given set-point is extended to the system to ensure the good tracking performance. Second, by analyzing the relationship between the value function and the Q -function obtained from the 2D system’s performance index, the 2D Bellman equation is obtained in terms of output feedback that is independent of the model parameters. The optimal control problem can be effectively solved by the proposed method in this paper when the policy iteration is executed using only the measurement data of system along the batch and time directions. Following that, the proposed approach’s unbiasedness and convergence are strictly confirmed. Finally, the simulation results for the injection molding process demonstrate that the proposed method is capable of determining the optimal control law as the number of batches is growing increasingly.

automation & control systems,engineering, chemical
Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

Mingduo Lin,Bo Zhao,Derong Liu

DOI: https://doi.org/10.1109/tnnls.2024.3379207

IF: 14.255

2024-01-01

IEEE Transactions on Neural Networks and Learning Systems

Abstract:Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and Q -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.

computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
An Information-state based Approach to the Optimal Output Feedback Control of Nonlinear Systems

Raman Goyal,Ran Wang,Mohamed Naveed Gul Mohamed,Aayushman Sharma,Suman Chakravorty

2023-10-06

Abstract:This paper develops a data-based approach to the closed-loop output feedback control of nonlinear dynamical systems with a partial nonlinear observation model. We propose an information state based approach to rigorously transform the partially observed problem into a fully observed problem where the information state consists of the past several observations and control inputs. We further show the equivalence of the transformed and the initial partially observed optimal control problems and provide the conditions to solve for the deterministic optimal solution. We develop a data based generalization of the iterative Linear Quadratic Regulator (iLQR) to partially observed systems using a local linear time varying model of the information state dynamics approximated by an Autoregressive moving average (ARMA) model, that is generated using only the input-output data. This open-loop trajectory optimization solution is then used to design a local feedback control law, and the composite law then provides an optimum solution to the partially observed feedback design problem. The efficacy of the developed method is shown by controlling complex high dimensional nonlinear dynamical systems in the presence of model and sensing uncertainty.

Robotics,Systems and Control
Data‐driven disturbance compensation control for discrete‐time systems based on reinforcement learning

Lanyue Li,Jinna Li,Jiangtao Cao

DOI: https://doi.org/10.1002/acs.3793

IF: 3.369

2024-03-23

International Journal of Adaptive Control and Signal Processing

Abstract:Summary In this article, a self‐learning disturbance compensation control method is developed, which enables the unknown discrete‐time (DT) systems to achieve performance optimization in the presence of disturbances. Different from traditional model‐based and data‐driven state feedback control methods, the developed off‐policy Q‐learning algorithm updates the state feedback controller parameters and the compensator parameters by actively interacting with the unknown environment, thus the approximately optimal tracking can be realized using only data. First, an optimal tracking problem for a linear DT system with disturbance is formulated. Then, the design for controller is achieved by solving a zero‐sum game problem, leading to an off‐policy disturbance compensation Q‐learning algorithm with only a critic structure, which uses data to update disturbance compensation controller gains, without the knowledge of system dynamics. Finally, the effectiveness of the proposed method is verified by simulations.

automation & control systems,engineering, electrical & electronic
Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Longyan Hao,Chaoli Wang,Yibo Shi

DOI: https://doi.org/10.3390/math12101533

IF: 2.4

2024-05-15

Mathematics

Abstract:This article investigates the optimal tracking control problem for data-based stochastic discrete-time linear systems. An average off-policy Q-learning algorithm is proposed to solve the optimal control problem with random disturbances. Compared with the existing off-policy reinforcement learning (RL) algorithm, the proposed average off-policy Q-learning algorithm avoids the assumption of an initial stability control. First, a pole placement strategy is used to design an initial stable control for systems with unknown dynamics. Second, the initial stable control is used to design a data-based average off-policy Q-learning algorithm. Then, this algorithm is used to solve the stochastic linear quadratic tracking (LQT) problem, and a convergence proof of the algorithm is provided. Finally, numerical examples show that this algorithm outperforms other algorithms in a simulation.

mathematics

Interactions of salts and denaturing agents with a polyacrylamide gel.

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

A Learning-Based Optimal Tracking Controller for Continuous Linear Systems with Unknown Dynamics: Theory and Case Study

The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Output Feedback H∞ Control for Discrete Time Singularly Perturbed Systems with Markov Lossy Network: the Round-Robin-like Protocol Case

Output Feedback Adaptive Robust Learning Control of a Class of Nonlinear Systems with Periodic Disturbances

Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems Using Online Approximators

Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS with Unidentified Exosystem Dynamics.

Optimal H∞ output feedback control for a class of nonlinear systems

Optimal dynamic output feedback control of unknown linear continuous-time systems by adaptive dynamic programming

H∞$$ {h}_{\infty } $$ Optimal Output Tracking Control for Markov Jump Systems: A Reinforcement Learning‐based Approach

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Output-feedback Q-learning for discrete-time linear H-infinity tracking control: A Stackelberg game approach

Output‐feedback Q‐learning for discrete‐time linear <i>H</i><sup>∞</sup> tracking control: A Stackelberg game approach

Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-Time Systems

Novel two-dimensional off-policy Q -learning method for output feedback optimal tracking control of batch process with unknown dynamics

Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

An Information-state based Approach to the Optimal Output Feedback Control of Nonlinear Systems

Data‐driven disturbance compensation control for discrete‐time systems based on reinforcement learning

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method