Reinforcement Learning Based Quantum Circuit Optimization via ZX-Calculus

Jordi Riu,Jan Nogué,Gerard Vilaplana,Artur Garcia-Saez,Marta P. Estarellas
2024-06-04
Abstract:We propose a novel Reinforcement Learning (RL) method for optimizing quantum circuits using graph-theoretic simplification rules of ZX-diagrams. The agent, trained using the Proximal Policy Optimization (PPO) algorithm, employs Graph Neural Networks to approximate the policy and value functions. We demonstrate the capacity of our approach by comparing it against the best performing ZX-Calculus-based algorithm for the problem in hand. After training on small Clifford+T circuits of 5-qubits and few tenths of gates, the agent consistently improves the state-of-the-art for this type of circuits, for at least up to 80-qubit and 2100 gates, whilst remaining competitive in terms of computational performance. Additionally, we illustrate its versatility by targeting both total and two-qubit gate count reduction, conveying the potential of tailoring its reward function to the specific characteristics of each hardware backend. Our approach is ready to be used as a valuable tool for the implementation of quantum algorithms in the near-term intermediate-scale range (NISQ).
Quantum Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in quantum circuit optimization, how to use reinforcement learning (RL) combined with ZX - Calculus graphical language to reduce the number of quantum gates, especially the number of two - qubit gates, thereby improving the execution efficiency and reliability of quantum circuits on current quantum devices. Specifically, the paper proposes a new RL method based on the Proximal Policy Optimization (PPO) algorithm, using Graph Neural Networks (GNNs) to approximate the policy and value functions in order to achieve effective optimization of quantum circuits. This method aims to overcome the exploration difficulties caused by the overly large action space in traditional algebraic simplification methods, and by using the simplification rules of ZX - Calculus to reduce the types of actions that need to be processed, enabling the RL agent to explore and utilize the optimal actions more effectively, thus significantly improving the optimization effect on Clifford + T circuits while maintaining competitive computational performance, and is applicable to circuits with up to 80 qubits and 2,100 gates. In addition, this method also demonstrates its flexibility and can be adjusted through the reward function to adapt to the specific characteristics of different hardware back - ends.