Abstract:Reinforcement learning is a popular method of finding optimal solutions to complex problems. Algorithms like Q-learning excel at learning to solve stochastic problems without a model of their environment. However, they take longer to solve deterministic problems than is necessary. Q-learning can be improved to better solve deterministic problems by introducing such a model-based approach. This paper introduces the recursive backwards Q-learning (RBQL) agent, which explores and builds a model of the environment. After reaching a terminal state, it recursively propagates its value backwards through this model. This lets each state be evaluated to its optimal value without a lengthy learning process. In the example of finding the shortest path through a maze, this agent greatly outperforms a regular Q-learning agent.

What problem does this paper attempt to address?

The paper primarily addresses the issue of reinforcement learning in deterministic environments, specifically focusing on the shortcomings of the Q-learning algorithm in solving such problems. Although the traditional Q-learning algorithm is suitable for solving stochastic problems, it converges slowly in deterministic environments because it lacks effective utilization of the environment model. The paper proposes a new algorithm—Recursive Backwards Q-Learning (RBQL), which aims to find the optimal policy more quickly by constructing an environment model and backpropagating values after reaching terminal states. Specifically, the working principle of the RBQL algorithm is as follows: 1. **Exploration and Modeling**: The RBQL agent constructs an environment model during the exploration process. 2. **Backwards Value Propagation**: When a terminal state is reached, the algorithm traverses the explored states backwards and updates the value of each state according to the recursive backwards Q-learning update rules. 3. **Improved Learning Rule**: By setting the learning rate to 1, the Q-learning update formula is simplified, making the value of each state directly dependent on the reward and the discounted reward of the best neighbor. The paper also mentions the specific implementation details of the RBQL algorithm, including the use of the Godot game engine for simulation experiments, and how to handle the balance between exploration and exploitation. Additionally, the paper compares the performance of the RBQL algorithm with the standard Q-learning algorithm in maze tasks of different sizes through experiments. The experimental results show that the RBQL algorithm outperforms the Q-learning algorithm in all test cases. Particularly in larger mazes, the RBQL algorithm demonstrates significant advantages, not only requiring fewer average steps but also exhibiting more stable performance. As the maze size increases, the advantage of the RBQL algorithm over the Q-learning algorithm becomes more apparent. Especially in solving larger mazes, the RBQL algorithm can find the shortest path in fewer steps, whereas the Q-learning algorithm requires more exploratory steps.

Recursive Backwards Q-Learning in Deterministic Environments

Recursive Reinforcement Learning

Backward Curriculum Reinforcement Learning

Backward Learning for Goal-Conditioned Policies

A Novel Experience-Based Exploration Method for Q-Learning.

Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems

DQN with model-based exploration: efficient learning on environments with sparse rewards

An Improved Dyna-Q Algorithm Inspired by the Forward Prediction Mechanism in the Rat Brain for Mobile Robot Path Planning

A Goal-Conditioned Reinforcement Learning Algorithm with Environment Modeling

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no Libraries

C-Learning: Learning to Achieve Goals via Recursive Classification

Adaptive Deep Reinforcement Learning for Non-Stationary Environments

Reinforcement Learning in Non-Markovian Environments

Reinforcement learning algorithm for non-stationary environments

Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

Real-Time Recurrent Reinforcement Learning

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

Pathfinding in Random Partially Observable Environments with Vision-Informed Deep Reinforcement Learning

Vision-based navigation and obstacle avoidance via deep reinforcement learning