Measuring the Distance Between Finite Markov Decision Processes

Jinhua Song,Yang Gao,Hao Wang,Bo An
DOI: https://doi.org/10.5555/2936924.2936994
2016-01-01
Abstract:Markov decision processes (MDPs) have been studied for many decades. Recent research in using transfer learning methods to solve MDPs has shown that knowledge learned from one MDP may be used to solve a similar MDP better. In this paper, we propose two metrics for measuring the distance between finite MDPs. Our metrics are based on the Hausdorff metric which measures the distance between two subsets of a metric space and the Kantorovich metric for measuring the distance between probabilistic distributions. Our metrics can be used to compute the distance between reinforcement learning tasks that are modeled as MDPs. The second contribution of this paper is that we apply the metrics to direct transfer learning by finding the similar source tasks. Our third contribution is that we propose two knowledge transfer methods which transfer value functions of the selected source tasks to the target task. Extensive experimental results show that our metrics are effective in finding similar tasks and significantly improve the performance of transfer learning with the transfer methods.
What problem does this paper attempt to address?