Reinforcement Learning: Playing Tic-Tac-Toe

Jocelyn Ho,Jeffrey Huang,Benjamin Chang,Allison Liu,Zoe Liu
DOI: https://doi.org/10.47611/jsr.v11i3.1739
2023-03-08
Journal of Student Research
Abstract:Machine learning constructs computer systems that develop through experience. Applications surround disciplines in daily life ranging from malware filtering to image recognition. Recent research has shifted towards maximizing efficiency in decision-making, creating algorithms that quickly and accurately process patterns to generate insight. This research focuses on reinforcement learning, a paradigm of machine learning that makes decisions through maximizing reward. Specifically, we use Q-learning – a model-free reinforcement learning algorithm – to assign scores for different decisions given the unique states of the problem. Widyantoro et al. (2009) have studied the effect of Q-learning on learning to play Tic-Tac-Toe. However, the study yielded a win/tie rate of less than 50 percent. We believe that does not represent an effective algorithm to exploit the benefits of Q-learning fully. In the same environment, this research aims to close the gaps in the effectiveness of Q-learning while minimizing human input. Data were processed by setting the epsilon value as 0.9 to ensure randomness, then consecutively decrease with a constant rate as possible states increase. The program played 300,000 games against its previous version, eventually securing a win/tie rate of approximately 90 percent. Future directions include improving the efficiency of Q-learning algorithms and applying the research in practical fields.
What problem does this paper attempt to address?