An Efficient Training Strategy for a Temporal Difference Learning Based Tic-Tac-Toe Automatic Player

Jesús Fernández-Conde,Pedro Cuenca-Jiménez,José María Cañas
DOI: https://doi.org/10.1007/978-3-030-33846-6_47
2019-11-03
Abstract:Temporal Difference (TD) learning is a well-known technique used to train automatic players by self-play, in board games in which the number of possible states is relatively small. TD learning has been widely used due to its simplicity, but there are important issues that need to be addressed. Training the AI agent against a random player is not effective, as several millions of games are needed until the automatic player starts to play intelligently. On the other hand, training it against a perfect player is not an acceptable option due to exploratory concerns. In this paper we present an efficient training strategy for a TD-based automatic game player, which proves to outperform other techniques, needing only roughly two hundred thousand games of training to behave like a perfect player. We present the results obtained by simulation for the classic Tic-Tac-Toe game.
What problem does this paper attempt to address?