Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process

Bing-Kun Bao,Bao-Qun Yin,Hong-Sheng Xi
DOI: https://doi.org/10.1109/icicic.2008.318
2008-01-01
Abstract:A novel infinite-horizon policy-gradient estimation method with variable discount factor is proposed in this paper. This method tackles the normal policy-gradient estimation methods' limitations on unbalance of the bias and variance by using an incremental sequence as the discount factor. Numerical experiments conducted on the Markov decision process have shown its effectiveness.
What problem does this paper attempt to address?