Abstract:Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of language models in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. We believe these results may help develop better strategies to fine-tune agents in text-based RL scenarios.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate the impact of fine - tuning language models (LMs) on the training efficiency of agents in text - based reinforcement learning (TBRL), and whether such fine - tuning will lead to semantic degeneration, thereby affecting the performance of agents when handling tasks with out - of - training - set vocabulary. Specifically, the paper explores the following key issues: 1. **Can fine - tuning language models improve training efficiency?** The paper compares the performance of fixed pre - trained language models and fine - tuned language models during the training process through experiments to evaluate the impact of fine - tuning on training efficiency. 2. **Will fine - tuning language models enable agents to handle tasks containing out - of - training - set vocabulary more robustly?** Researchers test the performance of different models when facing game versions where the observation descriptions are rewritten or the vocabulary is replaced, in order to evaluate the impact of fine - tuning on the generalization ability of agents. The paper finds that although fine - tuning language models can accelerate the learning of agents on specific tasks, this practice may lead to semantic degeneration, that is, the language model "forgets" the semantic associations it learned during the pre - training stage, which will make agents perform poorly when dealing with slightly different or unseen vocabulary. Therefore, using a fixed pre - trained language model may better support the generalization ability of agents while maintaining semantic information. These research results are of great significance for developing better strategies to fine - tune agents in text - based reinforcement learning scenarios.

On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting

Revisiting the Roles of "Text" in Text Games

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint

Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

Language Understanding for Text-based Games Using Deep Reinforcement Learning

Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features

Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation

Fine-Tuning Language Models from Human Preferences

A Fine-Tuned Large Language Model for Domain-Specific with Reinforcement Learning

Learning to Generate Better Than Your LLM

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

Learning to Modulate pre-trained Models in RL

Words as Beacons: Guiding RL Agents with High-Level Language Prompts