Abstract:Text-based games are a popular testbed for language-based reinforcement learning (RL). In previous work, deep Q-learning is commonly used as the learning agent. Q-learning algorithms are challenging to apply to complex real-world domains due to, for example, their instability in training. Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the text-based environment. To deal with sparse extrinsic rewards from the environment, we combine it with a potential-based reward shaping technique to provide more informative (dense) reward signals to the RL agent. We apply our method to play difficult text-based games. The SAC method achieves higher scores than the Q-learning methods on many games with only half the number of training steps. This shows that it is well-suited for text-based games. Moreover, we show that the reward shaping technique helps the agent to learn the policy faster and achieve higher scores. In particular, we consider a dynamically learned value function as a potential function for shaping the learner's original sparse reward signals.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the Reinforcement Learning (RL) problem in text - adventure games. Specifically, the authors focus on how to apply the maximum - entropy reinforcement learning algorithm - especially the Soft Actor - Critic (SAC) algorithm in these games, to improve the game score and reduce the number of time steps required for training. The paper mentions that although the traditional Deep Q - Learning method is widely used, it has problems such as unstable training and being difficult to apply to complex real - world domains. Therefore, the authors propose using the SAC algorithm and combining it with the Potential - Based Reward Shaping technique to provide more intensive reward signals to RL agents, thereby accelerating the learning process and improving performance. ### Main contributions of the paper: 1. **Proposing the use of the SAC algorithm**: As an alternative to Deep Q - Learning, it is used for text - adventure games and solves the problem of instability in Q - Learning algorithm training. 2. **Introducing the reward shaping technique**: An effective potential - based reward shaping method for discrete action spaces is designed, which helps to accelerate the learning process. 3. **Experimental comparison**: Compared with the Deep Q - Learning method in multiple games, it is shown that the SAC method can obtain a higher score with fewer training rounds. 4. **Accelerating convergence**: For some games, through the reward shaping technique, it is proved that convergence can be achieved more quickly. ### Specific problems solved: - **Sparse reward problem**: Text - adventure games usually have sparse external rewards. Especially in the early training stage, agents need to perform many actions before receiving feedback. This is particularly serious in text - adventure games because their action spaces are large and context - dependent. Through the reward shaping technique, the authors enable agents to obtain useful information from the environment more quickly and accelerate the learning process. - **Challenges of action spaces**: The action spaces of text - adventure games are not only large but also dynamically changing, which makes it difficult for traditional RL methods to handle effectively. The SAC algorithm and its adaptive improvements can better cope with such dynamically changing action spaces. ### Conclusion: The paper experimentally verifies the effectiveness and superiority of the SAC algorithm and its reward shaping technique in text - adventure games, especially in dealing with sparse rewards and dynamic action spaces. These results indicate that the SAC algorithm is a good choice for text - adventure games.

Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Abstract then Play: A Skill-centric Reinforcement Learning Framework for Text-based Games.

Using reinforcement learning to learn how to play text-based games

Reinforcement Learning For Constraint Satisfaction Game Agents (15-Puzzle, Minesweeper, 2048, and Sudoku)

Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics

Language Understanding for Text-based Games Using Deep Reinforcement Learning

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Generalizing soft actor-critic algorithms to discrete action spaces

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

Generalization in Text-based Games via Hierarchical Reinforcement Learning

An Analysis of Deep Reinforcement Learning Agents for Text-based Games

Towards automating Codenames spymasters with deep reinforcement learning

Perceiving the World: Question-guided Reinforcement Learning for Text-based Games

An Effective Maximum Entropy Exploration Approach for Deceptive Game in Reinforcement Learning.

Reward Space Noise for Exploration in Deep Reinforcement Learning

Revisiting the Roles of "Text" in Text Games

Revisiting Discrete Soft Actor-Critic

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning