Abstract:Quantum emulators play an important role in the development and testing of quantum algorithms, especially given the limitations of the current FTQC era. Developing high-speed, memory-optimized quantum emulators is a growing research trend, with gate fusion being a promising technique. However, existing gate fusion implementations often struggle to efficiently support large-scale quantum systems with a high number of qubits due to a lack of optimizations for the exponential growth in memory requirements. Therefore, this study proposes the EMMS (Efficient-Memory Matrix Storage) method for storing quantum operators and states, along with an EMMS-based Quantum Emulator Accelerator (QEA) architecture that incorporates multiple processing elements (PEs) to accelerate tensor product and matrix multiplication computations in quantum emulation with gate fusion. The theoretical analysis of the QEA on the Xilinx ZCU102 FPGA, using varying numbers of PEs and different depths of unitary and local data memory, reveals a linear increase in memory depth with the number of qubits. This scaling highlights the potential of the EMMS-based QEA to accommodate larger quantum circuits, providing insights into selecting appropriate memory sizes and FPGA devices. Furthermore, the estimated performance of the QEA with PE counts ranging from $2^2$ to $2^5$ on the Xilinx ZCU102 FPGA demonstrates that increasing the number of PEs significantly reduces the computation cycle count for circuits with fewer than 18 qubits, making it significantly faster than previous works.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are as follows: In quantum simulation, as the number of qubits increases, the memory requirement grows exponentially, making it difficult for existing quantum simulators based on gate - fusion techniques to efficiently support large - scale quantum systems. Specifically: 1. **Memory Requirement Problem**: Existing quantum simulation methods need to store huge matrices and state vectors when dealing with a large number of qubits, which causes the memory consumption to increase rapidly and limits the scale of simulation. 2. **Computational Efficiency Problem**: Matrix multiplication and tensor product operations in traditional methods become very time - consuming as the number of qubits increases, affecting the speed and efficiency of simulation. To solve these problems, this research proposes an Efficient - Memory Matrix Storage (EMMS) method and designs a Quantum Emulation Accelerator (QEA) based on this method. EMMS reduces memory usage by optimizing the matrix storage method and improves the computational speed through Processing Elements (PEs). The QEA architecture utilizes Field - Programmable Gate Arrays (FPGA) to implement these optimizations, thus being able to simulate large - scale quantum circuits more efficiently. ### Main Contributions 1. **Proposing the EMMS Method**: By decomposing large matrices into smaller sub - matrices and adopting a sparse matrix storage format (such as COO format), the memory requirement is significantly reduced. 2. **Designing the QEA Architecture**: Combining parallel processing of multiple PEs accelerates tensor product and matrix multiplication operations and improves the overall computational efficiency. 3. **Theoretical Analysis and Evaluation**: By analyzing configurations with different numbers of PEs and memory depths, the performance advantages of the EMMS method on quantum circuits of different scales are demonstrated. ### Key Formulas - Quantum State Evolution Formula: \[ |\psi(m)\rangle = U(\theta)\times|\psi(0)\rangle=\left(\prod_{j = 1}^{m}U(t)(\theta(t))\right)\times|\psi(0)\rangle \] - Decomposition of Tensor Product and Matrix Multiplication: \[ T(G(t + 1))|\psi(t)\rangle=(T(G)\otimes T(G))|\psi(t)\rangle \] - Computation Cycle Formula: \[ C_{QC}=C_{\text{Write}|\psi(0)\rangle}+\sum_{t = 0}^{m - 1}(C_{U(t + 1)}^{\text{TP}}+C_{U(t + 1)}^{\text{MM}})+C_{\text{Read}|\psi(m)\rangle} \] Through these methods and architectures, this research aims to make quantum simulation more efficient, especially when dealing with large - scale quantum systems, and be able to achieve a better balance between memory usage and computational speed.

Theoretical Analysis of the Efficient-Memory Matrix Storage Method for Quantum Emulation Accelerators with Gate Fusion on FPGAs

Efficient Fpga Emulation Of Quantum Fourier Transform

Mera: Memory Reduction and Acceleration for Quantum Circuit Simulation via Redundancy Exploration

FQsun: A Configurable Wave Function-Based Quantum Emulator for Power-Efficient Quantum Simulations

MEMQSim: Highly Memory-Efficient and Modularized Quantum State-Vector Simulation

Optimising Iteration Scheduling for Full-State Vector Simulation of Quantum Circuits on FPGAs

ReQUSA: A novel ReRAM-based hardware accelerator architecture for high-speed quantum computer simulation

AMARETTO: Enabling Efficient Quantum Algorithm Emulation on Low-Tier FPGAs

Project and Implementation of a Quantum Logic Gate Emulator on FPGA Using a Model-Based Design Approach

SEE-MCAM: Scalable Multi-bit FeFET Content Addressable Memories for Energy Efficient Associative Search

Fast scalable and low-power quantum circuit simulation on the cluster of GPUs platforms

An In-Memory-Computing Structure with Quantum-Dot Transistor Toward Neural Network Applications: From Analog Circuits to Memory Arrays

Fast emulation of fermionic circuits with matrix product states

Toward cost-effective quantum circuit simulation with performance tuning techniques

An optimization of traditional CPU emulation techniques for execution on a quantum computer

Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework

A Scalable FPGA Architecture for Quantum Computing Simulation

Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

Quantum Memory: A Missing Piece in Quantum Computing Units

Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection

Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs