RAMario: Experimental Approach to Reptile Algorithm -- Reinforcement Learning for Mario

Sanyam Jain
2023-05-17
Abstract:This research paper presents an experimental approach to using the Reptile algorithm for reinforcement learning to train a neural network to play Super Mario Bros. We implement the Reptile algorithm using the Super Mario Bros Gym library and TensorFlow in Python, creating a neural network model with a single convolutional layer, a flatten layer, and a dense layer. We define the optimizer and use the Reptile class to create an instance of the Reptile meta-learning algorithm. We train the model using multiple tasks and episodes, choosing actions using the current weights of the neural network model, taking those actions in the environment, and updating the model weights using the Reptile algorithm. We evaluate the performance of the algorithm by printing the total reward for each episode. In addition, we compare the performance of the Reptile algorithm approach to two other popular reinforcement learning algorithms, Proximal Policy Optimization (PPO) and Deep Q-Network (DQN), applied to the same Super Mario Bros task. Our results demonstrate that the Reptile algorithm provides a promising approach to few-shot learning in video game AI, with comparable or even better performance than the other two algorithms, particularly in terms of moves vs distance that agent performs for 1M episodes of training. The results shows that best total distance for world 1-2 in the game environment were ~1732 (PPO), ~1840 (DQN) and ~2300 (RAMario). Full code is available at <a class="link-external link-https" href="https://github.com/s4nyam/RAMario" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to use the Reptile algorithm for reinforcement learning to train neural networks to play Super Mario Bros and improve few-shot learning capabilities in video game AI. Specifically, the paper implements the Reptile algorithm and compares it with two popular reinforcement learning algorithms—Proximal Policy Optimization (PPO) and Deep Q-Network (DQN)—to evaluate its performance in training Mario game agents. The main objectives of the study include: 1. **Improving few-shot learning capabilities**: The Reptile algorithm can quickly adapt to new tasks with limited data, which is particularly important in video game AI. 2. **Enhancing training efficiency and stability**: Compared to PPO and DQN, the Reptile algorithm demonstrates better convergence and stability during training. 3. **Optimizing movement and distance performance**: Experimental results show that the Reptile algorithm outperforms PPO and DQN in terms of the number of movements and distance covered in the Mario game. Through these objectives, the paper aims to explore the potential applications of the Reptile algorithm in video game AI and provide a reference for future few-shot learning research.