Abstract:Quantum Tiq-Taq-Toe is a well-known benchmark and playground for both quantum computing and machine learning. Despite its popularity, no reinforcement learning (RL) methods have been applied to Quantum Tiq-Taq-Toe. Although there has been some research on Quantum Chess this game is significantly more complex in terms of computation and analysis. Therefore, we study the combination of quantum computing and reinforcement learning in Quantum Tiq-Taq-Toe, which may serve as an accessible testbed for the integration of both fields. Quantum games are challenging to represent classically due to their inherent partial observability and the potential for exponential state complexity. In Quantum Tiq-Taq-Toe, states are observed through Measurement (a 3x3 matrix of state probabilities) and Move History (a 9x9 matrix of entanglement relations), making strategy complex as each move can collapse the quantum state.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to apply Reinforcement Learning (RL) to the Quantum Tic - Tac - Toe. Although the Quantum Tic - Tac - Toe is a well - known benchmark and experimental platform in the fields of quantum computing and machine learning, no research has previously applied the Reinforcement Learning method to this game. Therefore, the author hopes to fill this gap through this research and explore the possibility of combining quantum computing and Reinforcement Learning. ### Research Background 1. **Introduction to Quantum Tic - Tac - Toe**: - Quantum Tic - Tac - Toe is a quantum version of the classic Tic - Tac - Toe, in which each cell can be in one of three states: empty, X, or O, and these states exist in the form of quantum superposition. - The actions in the game include not only classic actions (such as changing an empty cell to X or O), but also quantum entanglement actions, which make the state - space complexity of the game increase exponentially. 2. **Challenges of the Problem**: - Due to the partial observability of the quantum system and the exponential growth of state complexity, classic methods are difficult to effectively represent and process the states of Quantum Tic - Tac - Toe. - This complexity makes it difficult for traditional Reinforcement Learning algorithms to be directly applied to Quantum Tic - Tac - Toe. ### Research Objectives The author's main objectives are to solve the strategy - learning problem in Quantum Tic - Tac - Toe by introducing Reinforcement Learning algorithms, specifically including: - **Explore the Combination of Quantum Computing and Reinforcement Learning**: Research how to use Reinforcement Learning algorithms to conduct effective strategy learning in a quantum environment. - **Develop Learning Methods Suitable for the Quantum Environment**: Propose learning algorithms that can handle partial observability and complex state spaces. - **Verify Performance under Different Rules**: Compare the performance differences of Reinforcement Learning algorithms under different versions of the rules of Quantum Tic - Tac - Toe. ### Methodology To achieve the above objectives, the author adopts the following methods: 1. **Define Two Versions of Game Rules**: - **Version 1 (V1)**: Restrict the entanglement action to must include at least one empty cell, similar to the traditional rules. - **Version 3 (V3)**: Allow the entanglement action between any two cells, increasing the strategic depth. 2. **Design the Observation Space**: - **Measurement Matrix**: Record the probability of each cell being in an empty, X, or O state. - **Historical Action Matrix**: Record the history of past entanglement actions and classic actions. 3. **Use the PPO Algorithm for Self - Play Training**: - Train the agent through the Proximal Policy Optimization (PPO) algorithm and evaluate its performance under different information conditions. ### Results and Discussion - **Results of Version 1**: In the case of restricting entanglement actions, the first - mover player (X) often has an advantage, although there is randomness in the game. - **Results of Version 3**: After allowing more complex entanglement actions, combined with the information of the measurement matrix and the historical action matrix, the agent shows the optimal performance, reflecting the importance of comprehensive information. ### Future Work The author suggests that future research can further explore other methods to alleviate the partial observability problem, such as using state windows, Recurrent Neural Networks (RNN), Recursive State - Space Models or Transformers and other techniques. ### Summary This paper aims to explore new ways of combining quantum computing and machine learning by applying Reinforcement Learning to Quantum Tic - Tac - Toe. The research results show that under appropriate rules and information conditions, Reinforcement Learning can effectively learn strategies in a quantum environment, providing a valuable reference for further research.

Reinforcement learning for Quantum Tiq-Taq-Toe

Challenges for Reinforcement Learning in Quantum Circuit Design

A quantum-classical reinforcement learning model to play Atari games

A Quantum Reinforcement Learning Method For Repeated Game Theory

Reinforcement Learning with Quantum Variational Circuits

Reinforcement Learning with Neural Networks for Quantum Feedback

A reinforcement learning approach for quantum state engineering

An Introduction to Quantum Reinforcement Learning (QRL)

Brain-Inspired Agents for Quantum Reinforcement Learning

How to Teach AI to Play Bell Non-Local Games: Reinforcement Learning

Quantum noise modeling through Reinforcement Learning

Reinforcement Learning Quantum Local Search

Reinforcement Learning: Playing Tic-Tac-Toe

Reinforcement Learning with Neural Networks for Quantum Multiple Hypothesis Testing

From Easy to Hard: Tackling Quantum Problems with Learned Gadgets For Real Hardware

Reinforcement Learning Using Quantum Boltzmann Machines

Auxiliary Task-based Deep Reinforcement Learning for Quantum Control

Quantum enhancements for deep reinforcement learning in large spaces

QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train

Variational Quantum Circuits for Deep Reinforcement Learning

Quantum Reinforcement Learning in Non-Abelian Environments: Unveiling Novel Formulations and Quantum Advantage Exploration