Information and Control : Witsenhausen Revisited Sanjoy Mitter
A. Sahai
Abstract:1 I n t r o d u c t i o n In t radi t ional information theory, a technical notion of information is developed tha t is independent from the actual use of tha t information. Aside from its considerable aesthetic appeal, this body of ideas has proven itself to be quite useful in the context of information transmission. However, fundamental to most of the results in information theory is the use of long block lengths and letting sequence lengths tend to infinity as a way of gett ing the laws of large numbers to work to reduce uncertainty. In a control context, the focus is on the present. While there is a sense in which all of feedback control is about t rying to reduce uncertainty, a control action must be applied now and we can not afford to wait forever. In this report , we will a t t empt to get a handle on the role of information in control by revisiting two classic papers. The first of these is Witsenhausen ' s 1971 survey paper [4] on the "Separat ion of Es t imat ion and Control for Discrete T ime Systems." Here, we will give the essentials of Witsenhausen ' s *This research supported by U.S. Army Grant PAAL03-92-G-0115, Center for Intelligent Control Systems. tDepartment of Electrical Engineering and Computer Science and Laboratory for Information and Decision Systems, Massachusetts Institute of Technology. mitter~lids.mit.edu, sahai~mit.edu 282 Mitter and Sahai framework for talking about stochastic control problems. The key idea is tha t of information patterns a formal way of talking about the issue of "who knows what and when do they know it." Using this, we will restate his main assertions on the various forms of separation between estimation and control. Though the language is general, we will quickly find ourselves talking about linear systems with quadratic costs and Gaussian distributions for primitive random variables the LQG problem. With the basics behind us, we next consider Witsenhansen's 1968 "Counterexample In Stochastic Optimum Control" [3] which shows how impor tant the information pat tern really is to the control problem. It details a deceptively simple 2-stage LQG problem and shows that when you restrict to memoryless control, affine I controllers are no longer sufficient to minimize cost. The paper does this by computing the best affine controller and then exhibiting a nonlinear control law which does better. We then present a simpler family of nonlinear control laws and use them to get something much stronger a demonstrat ion that the ratio of the cost of the best affine controller and a nonlinear controller can go to infinity! Then, we t ry to use ideas from information theory to give some intuition as to why the affine controllers are suboptimal. At its heart, the problem seems to boil down to one of communication between stages 1 and 2. We argue tha t the restriction to affine controllers is suboptimal because it forces a tension between the complexity of the message and the reliability of its transmission. We show how the nonlinear controller is able to circumvent this tension, achieving bet ter performance. 2 Separat ion of Es t imat ion and Control In Witsenhausen's classic survey paper [4], he sets out to elucidate the relationships between estimation and control for discrete time, Bayesian 2 systems. The fundamental issue stems from the distinction between the control laws and the actual realizations of the control variables applied to the system. The designer chooses the laws to fulfill some objective, and until that choice is made, the control variables are still "random variables to be of yet uncertain status." 2 . 1 P r o b l e m F r a m e w o r k Witsenhausen considers a general finite-horizon distributed discrete t ime control problem. Time goes from 1 to T, there are M observation posts 3 , and K control stations 4 The causal sequence is as follows: 1 Linear plus constant 2 All "uncertainty" in the system is modeled probabilistically 3 For example, consider geographically distributed sensors 4 These usually represent distributed controllers In format ion and Control: Wi t senhausen Rev i s i t ed 283 1. Generation of random initial state x0. 2. Observations of outputs y~ , . . . , yM = (g~ ( xo , w~ ), . " , gM ( xo, w M ) ) 3. Application of controls u~ , . . , u~ 4. Transition to state Xl = f l ( x 0 , v l , U l , ' ' , u f ) and then this continues until the final s tate 2~T is reached. The uncertainty in the system is modeled by a basic set of independent primitive random variables: Xo;Vt ,W'~( t = 1 , . . . , T ; m = 1 , . . . , M ) . The vt enter into the state transition functions ft and the w~ into the observation functions g~ in the obvious ways. Finally, the preferences between outcomes are expressed consistently through an additive cost function on the state and the controls: ~-']~T= 1 h t (x t , u l , . . . , u K ) . The goal of the designer is to pick a design 7 specifying control laws "/7 tha t select the u~ to minimize the expected cost. F~rthermore, once all the 77 are selected, all the variables in the closed loop system become well defined random variables. More technically, given a complete design 7 and a pair of sets of values for some arbi t rary sets of the output and control variables, Y and U, we have a clearly defined a-field ~'(Y, U; 7) in probabili ty space and thus conditional distributions s for all the variables in the system ~ . 2.2 I n f o r m a t i o n P a t t e r n s As stated above, the problem is still incompletely specified. We need to know the sets from which we are allowed to pick the functions 77. Stated informally, the key questions are "who knows what when" and "what are they allowed to do with that information?". To formalize the first of these questions, the notion of in format ion pat tern is defined. This assigns to every control variable u~, two sets Yt,k and Ut,k of pairs of indices specifying which observation variables y~ and control variables u~ the control law 77 has access to 7 . Generally, no restriction is put on the functional form or range of 77, except the trivial one of saying that it should be measurable over the a-field generated by its arguments. However, sometimes it is interesting to restrict a t tent ion to jointly affine 77" For the idea of in format ion pat tern to be useful, we need a notion of equivalence over it. So, pat terns (Yt,k, Ut,k) and (~,k, /) t ,k) are equivalent if for any design 7 feasible with the first, there is a design ~ feasible with the second The underlying probability space and measure are determined by the primitive random variables. o For example, the conditional probability P(y~ 6 [-1, 1][y~ ---7, Y33 ___ 5,u~ = 0.5,7) should be defined and make sense 7 To be precise, "),~ takes as arguments all the y~ and u~ where (T, ~) 6 Yt,k and (0, ~) 6 Ut,k 284 Mitter and Sahai such tha t every system variable agrees under the two designs almost surely, s Witsenhausen next defines some classifications of information patterns. A pat tern is said to have perfect recall if Yt,k C_ Yt+l,k and Ut,k C Ut+l,k. A pa t t e rn is said to be classical if it has perfect recall and moreover Yt,k and Ut,k are independent of k. ~ We define two related te rms tha t will also be useful. A pa t t e rn is said to be perfectly classical if every s ta t ion has knowledge of all pas t outputs and controls. For the common case when the observation posts have a natural identification with the control s tat ions 1~ , a pa t t e rn is said to be locally classical if every stat ion can remember all of its past inputs and outputs . Now, the point of these definitions is to begin to get at the notion tha t as long as we have information about the relevant pas t control variables and outputs , we might not need to know all the control laws in order to have well defined random variables. Let L be a set of indices (8, k). We use 7L to refer to the restriction of design 7 to just the laws ")'ok. Now, call a triple (Y, U, L) a field basis if for any two designs 7, "Y, 7L = "~L implies 5r(Y, U; 7) = ~'(Y, U; "~). So, knowledge of the values of these part icular Y and U together with knowing the laws 7L is sufficient to understand the underlying probabil i ty space. 11