Policy gradient methods with model predictive control applied to ball bouncing

Paul Kulchenko,E. Todorov
Abstract:—We propose a policy parameterization well suited for control problems that involve both continuous dynamics and discrete events. The key idea is to parameterize the policy using a scalar function defined on the subset of states corresponding to discrete events. This function approximates the cost-to-go with respect to some master cost. Once the function is given, we define the policy using model-predictive control (MPC) extended to a first-exit setting: instead of optimizing to a predefined horizon, we optimize up to the next discrete event (ball-paddle contact). The proposed parameterization relies on numerical optimization to obtain the actual policy as opposed to evaluating an explicit formula, and has the advantage of being more compact and focusing on the aspects of the task. Once the policy has been defined, we simulate it using ”quenched” noise, and improve the parameters of the function via gradient descent on the resulting average master cost. We apply this method to the task of two-ball juggling on the same paddle and analyze its performance using a simulated model of the system.
What problem does this paper attempt to address?