Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

Tyler Kastner,Murat A. Erdogdu,Amir-massoud Farahmand
2023-12-04
Abstract:We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to construct models capable of effective planning in risk - sensitive reinforcement learning. Specifically, the authors focus on how to make risk - sensitive decisions through learning models, rather than traditional risk - neutral decisions. The main contributions of the paper are as follows: 1. **Prove that Proper Value Equivalence is only applicable to risk - neutral optimal planning**: The authors prove that in a risk - sensitive setting, the performance of the Proper Value Equivalence model will decline as the risk sensitivity increases (Section 3). 2. **Introduce the Distribution Equivalence Principle**: This principle can be used for optimal planning under any risk measure, but its computational complexity is relatively high (Section 4). 3. **Propose an approximate version of distribution equivalence**: This version allows the selection of specific risk measures for optimal planning and has practical application value (Section 5). 4. **Discuss how to learn these methods through loss functions and combine them with existing model - free algorithms** (Section 6). 5. **Verify the effectiveness of the framework through tables and large - scale experiments** (Section 7). ### Background of the Paper Reinforcement learning is a general framework in which agents optimize objectives through sequential decision - making, such as the expected value of future rewards (risk - neutral objective) or the conditional value - at - risk of future rewards (risk - sensitive objective). Traditional model - learning methods usually use maximum - likelihood estimation (MLE) to learn environmental models, but in highly stochastic or safety - critical environments, this method may not be sufficient to capture all important environmental characteristics. ### Main Contributions 1. **Theoretical Analysis**: - The authors first prove that Proper Value Equivalence is only applicable to risk - neutral optimal planning. Specifically, if a model is in the Proper Value Equivalence class, then it can perform optimal planning in a risk - neutral setting, but it may not be effective in a risk - sensitive setting (Proposition 3.1). - Further, they show that when using the Proper Value Equivalence model for strict risk - sensitive spectral - risk - measure planning, there is a trade - off between risk sensitivity and performance (Proposition 3.2). 2. **Introduce New Equivalence Concepts**: - **Distribution Equivalence Principle**: The authors introduce the Distribution Equivalence Principle, which is a model - equivalence concept based on the entire return distribution. By matching the return distribution of the model, optimal planning for any risk measure can be achieved (Theorem 4.3). - **Statistical Functional Equivalence**: To address the computational complexity issue of the Distribution Equivalence Principle in practice, the authors propose the concept of statistical functional equivalence. This concept allows the selection of specific risk measures for optimal planning and has practical application value (Definition 5.4 and Proposition 5.5). 3. **Experimental Verification**: - The authors conduct experiments in tabular and continuous environments to verify the effectiveness of the proposed framework. The experimental results show that using models of distribution equivalence or statistical functional equivalence can significantly improve the performance of risk - sensitive tasks. ### Conclusion This paper solves the key problems of model learning in risk - sensitive reinforcement learning by introducing the Distribution Equivalence Principle and the concept of statistical functional equivalence. These methods not only provide a new perspective theoretically but also perform well in practical applications, providing strong support for the optimization of risk - sensitive tasks.