Theoretical Analysis of the Efficient-Memory Matrix Storage Method for Quantum Emulation Accelerators with Gate Fusion on FPGAs

Tran Xuan Hieu Le,Hoai Luan Pham,Tuan Hai Vu,Vu Trung Duong Le,Nakashima Yasuhiko
2024-10-15
Abstract:Quantum emulators play an important role in the development and testing of quantum algorithms, especially given the limitations of the current FTQC era. Developing high-speed, memory-optimized quantum emulators is a growing research trend, with gate fusion being a promising technique. However, existing gate fusion implementations often struggle to efficiently support large-scale quantum systems with a high number of qubits due to a lack of optimizations for the exponential growth in memory requirements. Therefore, this study proposes the EMMS (Efficient-Memory Matrix Storage) method for storing quantum operators and states, along with an EMMS-based Quantum Emulator Accelerator (QEA) architecture that incorporates multiple processing elements (PEs) to accelerate tensor product and matrix multiplication computations in quantum emulation with gate fusion. The theoretical analysis of the QEA on the Xilinx ZCU102 FPGA, using varying numbers of PEs and different depths of unitary and local data memory, reveals a linear increase in memory depth with the number of qubits. This scaling highlights the potential of the EMMS-based QEA to accommodate larger quantum circuits, providing insights into selecting appropriate memory sizes and FPGA devices. Furthermore, the estimated performance of the QEA with PE counts ranging from $2^2$ to $2^5$ on the Xilinx ZCU102 FPGA demonstrates that increasing the number of PEs significantly reduces the computation cycle count for circuits with fewer than 18 qubits, making it significantly faster than previous works.
Hardware Architecture
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: In quantum simulation, as the number of qubits increases, the memory requirement grows exponentially, making it difficult for existing quantum simulators based on gate - fusion techniques to efficiently support large - scale quantum systems. Specifically: 1. **Memory Requirement Problem**: Existing quantum simulation methods need to store huge matrices and state vectors when dealing with a large number of qubits, which causes the memory consumption to increase rapidly and limits the scale of simulation. 2. **Computational Efficiency Problem**: Matrix multiplication and tensor product operations in traditional methods become very time - consuming as the number of qubits increases, affecting the speed and efficiency of simulation. To solve these problems, this research proposes an Efficient - Memory Matrix Storage (EMMS) method and designs a Quantum Emulation Accelerator (QEA) based on this method. EMMS reduces memory usage by optimizing the matrix storage method and improves the computational speed through Processing Elements (PEs). The QEA architecture utilizes Field - Programmable Gate Arrays (FPGA) to implement these optimizations, thus being able to simulate large - scale quantum circuits more efficiently. ### Main Contributions 1. **Proposing the EMMS Method**: By decomposing large matrices into smaller sub - matrices and adopting a sparse matrix storage format (such as COO format), the memory requirement is significantly reduced. 2. **Designing the QEA Architecture**: Combining parallel processing of multiple PEs accelerates tensor product and matrix multiplication operations and improves the overall computational efficiency. 3. **Theoretical Analysis and Evaluation**: By analyzing configurations with different numbers of PEs and memory depths, the performance advantages of the EMMS method on quantum circuits of different scales are demonstrated. ### Key Formulas - Quantum State Evolution Formula: \[ |\psi(m)\rangle = U(\theta)\times|\psi(0)\rangle=\left(\prod_{j = 1}^{m}U(t)(\theta(t))\right)\times|\psi(0)\rangle \] - Decomposition of Tensor Product and Matrix Multiplication: \[ T(G(t + 1))|\psi(t)\rangle=(T(G)\otimes T(G))|\psi(t)\rangle \] - Computation Cycle Formula: \[ C_{QC}=C_{\text{Write}|\psi(0)\rangle}+\sum_{t = 0}^{m - 1}(C_{U(t + 1)}^{\text{TP}}+C_{U(t + 1)}^{\text{MM}})+C_{\text{Read}|\psi(m)\rangle} \] Through these methods and architectures, this research aims to make quantum simulation more efficient, especially when dealing with large - scale quantum systems, and be able to achieve a better balance between memory usage and computational speed.