A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

Adrien Banse,Venkatraman Renganathan,Raphaël M. Jungers
2024-07-11
Abstract:We extend the notion of Cantor-Kantorovich distance between Markov chains introduced by (Banse et al., 2023) in the context of Markov Decision Processes (MDPs). The proposed metric is well-defined and can be efficiently approximated given a finite horizon. Then, we provide numerical evidences that the latter metric can lead to interesting applications in the field of reinforcement learning. In particular, we show that it could be used for forecasting the performance of transfer learning algorithms.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Definition and Computation of Cantor-Kantorovich Metric**: The paper extends the concept of Cantor-Kantorovich distance proposed by Banse et al. (2023), applying it to Markov Decision Processes (MDPs) and proposes an efficient method to approximate this metric. 2. **Application of the Metric in Reinforcement Learning**: By introducing the aforementioned metric, the paper demonstrates its potential application value in the field of Transfer Learning (TL). Specifically, the authors show that this metric can be used to predict the performance of transfer learning algorithms between different MDPs. 3. **Addressing the Similarity Metric Problem in Transfer Learning**: Although previous research has shown that transfer learning algorithms generally perform better when the source MDP and the target MDP are more similar, there is currently a lack of a metric that is both easy to compute and accurately reflects this similarity. The Cantor-Kantorovich metric proposed in the paper aims to fill this gap. In summary, the main contribution of the paper is the extension of the Cantor-Kantorovich metric to MDPs and demonstrating its capability in assessing the similarity between source and target tasks in transfer learning scenarios, which helps to improve the effectiveness of transfer learning. Additionally, the paper validates its theoretical results through numerical experiments.