Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon

Aditya Dave,Ioannis Faros,Nishanth Venkatesh,Andreas A. Malikopoulos
2023-04-01
Abstract:Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time instance. Our second contribution is defining an approximate information state that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy and illustrate the effectiveness of our approach in partially observed decision-making problems with a numerical example.
Optimization and Control,Artificial Intelligence,Systems and Control
What problem does this paper attempt to address?
This paper aims to solve the problem of how to design control strategies to minimize the discounted cost in the worst - case in partially - observed systems under uncertain environments. Specifically, the paper focuses on how to make control strategies robust against adversarial disturbances and modeling uncertainties in an infinite - time horizon. To achieve this goal, the authors propose a framework for approximate control and learning in partially - observed systems to minimize the discounted cost in the worst - case. ### Core Problems of the Paper 1. **Robustness and Uncertainty**: - The system may be affected by adversarial disturbances and modeling uncertainties, which may lead to a decline in the actual performance of control strategies. - The goal of the paper is to design a control strategy in such an uncertain environment so that it can still maintain good performance in the worst - case. 2. **Partially - Observed Systems**: - In many practical applications, the state of the system cannot be fully observed and can only be inferred from partial observation data. - This partially - observed characteristic makes traditional control methods based on complete state information no longer applicable, and new methods are required to handle it. 3. **Infinite - Time Horizon**: - The control strategy needs to be effective in an infinite - time horizon, which means that the long - term cost - accumulation effect needs to be considered. - Traditional dynamic programming methods will encounter the problem of exponential growth in computational complexity in this case, so new methods are needed to solve this problem. ### Main Contributions 1. **Definition of Information State**: - The authors introduce the concept of general information state and define a time - invariant dynamic programming (DP) to calculate the optimal control strategy. - The introduction of the information state significantly improves the computational feasibility without losing optimality. 2. **Simplification of Observable Cost**: - For the problem of observable cost, the authors further simplify the definition of the information state and propose the concept of approximate information state. - The approximate information state can be directly learned from the observation data without the need for a complete system - dynamics model. 3. **Bound of Performance Loss**: - The authors derive the upper bound of the performance loss of the control strategy calculated using the approximate information state, which provides a theoretical guarantee for practical applications. 4. **Numerical Example**: - Through a numerical example, the authors show that the approximate information state can be learned from incomplete system - dynamics data and use deep Q - learning to calculate the approximate control strategy. ### Formulas and Symbols - **Discount Factor**: \(\gamma\in(0, 1)\) - **Cost Range**: \(C\) is a bounded set, \(\min\{C\} = c_{\min}\), \(\max\{C\} = c_{\max}\) - **Memory Variable**: \(M_t=(Y_0:t, U_0:t - 1)\) - **Value Function**: \[ V_g(t, m_t)=\sup_{a_t, c^{\infty}_t\in[[A_t, C^{\infty}_t|m_t]]_g}(a_t+\gamma^t\cdot c^{\infty}_t) \] - **Optimal Value Function**: \[ V_t(m_t)=\inf_{g\in G}V_g(t, m_t) \] - **Fixed - Point Iteration**: \[ \Lambda^{n + 1}(s, z)=[T\Lambda^n](s, z)=\inf_{u\in U}\sup_{c\in C, s'\in S}\left(c+\gamma\cdot\Lambda^n(s', \gamma\cdot z)+\rho(c, s'|s, u)\cdot z^{-1}\right) \] ### Conclusion This paper provides a new method for designing robust control strategies in partially - observed systems by introducing the concepts of information state and approximate information state. This method not only has a strict mathematical foundation in theory but also has practical applications.