Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning

Weichen Li,Rati Devidze,Sophie Fellenz
2023-06-27
Abstract:Text-based games are a popular testbed for language-based reinforcement learning (RL). In previous work, deep Q-learning is commonly used as the learning agent. Q-learning algorithms are challenging to apply to complex real-world domains due to, for example, their instability in training. Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the text-based environment. To deal with sparse extrinsic rewards from the environment, we combine it with a potential-based reward shaping technique to provide more informative (dense) reward signals to the RL agent. We apply our method to play difficult text-based games. The SAC method achieves higher scores than the Q-learning methods on many games with only half the number of training steps. This shows that it is well-suited for text-based games. Moreover, we show that the reward shaping technique helps the agent to learn the policy faster and achieve higher scores. In particular, we consider a dynamically learned value function as a potential function for shaping the learner's original sparse reward signals.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the Reinforcement Learning (RL) problem in text - adventure games. Specifically, the authors focus on how to apply the maximum - entropy reinforcement learning algorithm - especially the Soft Actor - Critic (SAC) algorithm in these games, to improve the game score and reduce the number of time steps required for training. The paper mentions that although the traditional Deep Q - Learning method is widely used, it has problems such as unstable training and being difficult to apply to complex real - world domains. Therefore, the authors propose using the SAC algorithm and combining it with the Potential - Based Reward Shaping technique to provide more intensive reward signals to RL agents, thereby accelerating the learning process and improving performance. ### Main contributions of the paper: 1. **Proposing the use of the SAC algorithm**: As an alternative to Deep Q - Learning, it is used for text - adventure games and solves the problem of instability in Q - Learning algorithm training. 2. **Introducing the reward shaping technique**: An effective potential - based reward shaping method for discrete action spaces is designed, which helps to accelerate the learning process. 3. **Experimental comparison**: Compared with the Deep Q - Learning method in multiple games, it is shown that the SAC method can obtain a higher score with fewer training rounds. 4. **Accelerating convergence**: For some games, through the reward shaping technique, it is proved that convergence can be achieved more quickly. ### Specific problems solved: - **Sparse reward problem**: Text - adventure games usually have sparse external rewards. Especially in the early training stage, agents need to perform many actions before receiving feedback. This is particularly serious in text - adventure games because their action spaces are large and context - dependent. Through the reward shaping technique, the authors enable agents to obtain useful information from the environment more quickly and accelerate the learning process. - **Challenges of action spaces**: The action spaces of text - adventure games are not only large but also dynamically changing, which makes it difficult for traditional RL methods to handle effectively. The SAC algorithm and its adaptive improvements can better cope with such dynamically changing action spaces. ### Conclusion: The paper experimentally verifies the effectiveness and superiority of the SAC algorithm and its reward shaping technique in text - adventure games, especially in dealing with sparse rewards and dynamic action spaces. These results indicate that the SAC algorithm is a good choice for text - adventure games.