Planning with General Objective Functions: Going Beyond Total Rewards

Ruosong Wang,Peilin Zhong,Simon S. Du,Russ R. Salakhutdinov,Lin F. Yang
2020-01-01
Abstract:Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize ∑ H h =1 r h where H is the planning horizon. However, this paradigm fails to model important practical applications, e.g., safe control that aims to maximize the lowest reward, i.e., maximize min H h =1 r h . In this paper, based on techniques in sketching algorithms, we propose a novel planning algorithm in deterministic systems which deals with a large class of objective functions of the form ƒ ( r 1 , r 2 , r H ) that are of interest to practical applications. We show that efficient planning is possible if ƒ is symmetric under permutation of coordinates and satisfies certain technical conditions. Complementing our algorithm, we further prove that removing any of the conditions will make the problem intractable in the worst case and thus demonstrate the necessity of our conditions.
What problem does this paper attempt to address?