MDP environments for the OpenAI Gym

Andreas Kirsch
DOI: https://doi.org/10.48550/arXiv.1709.09069
2017-09-26
Abstract:The OpenAI Gym provides researchers and enthusiasts with simple to use environments for reinforcement learning. Even the simplest environment have a level of complexity that can obfuscate the inner workings of RL approaches and make debugging difficult. This whitepaper describes a Python framework that makes it very easy to create simple Markov-Decision-Process environments programmatically by specifying state transitions and rewards of deterministic and non-deterministic MDPs in a domain-specific language in Python. It then presents results and visualizations created with this MDP framework.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simplify and optimize the creation of Markov Decision Process (MDP) environments to support the research and development of Reinforcement Learning (RL). Specifically, the author points out that even the simplest environments in OpenAI Gym have a certain level of complexity, which may make it difficult for researchers to debug and understand the internal operation mechanisms of RL algorithms. To solve this problem, the author proposes a Python framework that allows users to easily create deterministic and non - deterministic MDP environments by specifying state transitions and rewards. The main objectives of this framework include: 1. **Simplify the creation of MDP environments**: By providing an easy - to - use Domain - Specific Language (DSL), users can conveniently define the states, actions, transition probabilities, and reward functions of MDP. 2. **Improve debugging efficiency**: By converting the MDP environment into a visual graph and being compatible with OpenAI Gym, researchers can more intuitively understand and debug their reinforcement learning models. 3. **Verification and analysis**: Use linear programming to calculate the optimal value function, helping researchers verify the correctness of their reinforcement learning algorithms and further analyze other related properties. Therefore, the core contribution of this paper lies in providing a tool that makes it simpler and more efficient to create and debug MDP environments for reinforcement learning research.