Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

Rong Fu,Zhongling Su,Han-Sen Zhong,Xiti Zhao,Jianyang Zhang,Feng Pan,Pan Zhang,Xianhe Zhao,Ming-Cheng Chen,Chao-Yang Lu,Jian-Wei Pan,Zhiling Pei,Xingcheng Zhang,Wanli Ouyang
2024-07-01
Abstract:Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and device levels to achieve unprecedented scalability for tensor networks. This enables the handling of large-scale tensor networks with memory capacities reaching tens of terabytes, surpassing memory space constraints on a single node. Our techniques enable accommodating large-scale tensor networks with up to tens of terabytes of memory, reaching up to 2304 GPUs with a peak computing power of 561 PFLOPS half-precision. Notably, we have achieved a time-to-solution of 14.22 seconds with energy consumption of 2.39 kWh which achieved fidelity of 0.002 and our most remarkable result is a time-to-solution of 17.18 seconds, with energy consumption of only 0.29 kWh which achieved a XEB of 0.002 after post-processing, outperforming Google's quantum processor Sycamore in both speed and energy efficiency, which recorded 600 seconds and 4.3 kWh, respectively.
Quantum Physics,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper focuses on the system-level optimization issues in quantum circuit simulation, aiming to achieve superior computational speed and energy efficiency. Despite recent progress in classical algorithms, generating uncorrelated samples for random quantum circuits remains a challenge. Researchers propose an innovative large-scale system technology that enhances the scalability of tensor networks through optimization at the global, node, and device levels, surpassing the memory limitations of individual nodes with a memory capacity of up to 10^14 bytes. They achieve a computing power of up to 2304 GPUs with a peak computing power of 561 PFLOPS, a time-to-solution of 14.22 seconds, an energy consumption of 2.39 kWh, and a fidelity of 0.002. Moreover, they achieve a time-to-solution of 17.18 seconds and an energy consumption of only 0.29 kWh, surpassing the speed and energy efficiency of Google's Sycamore quantum processor. The main objectives of the paper are: 1. Achieve an order of magnitude reduction in computation time, marking a significant breakthrough in the field of quantum computing and demonstrating that classical computers can not only keep up with quantum computers but even surpass them in certain tasks. 2. Focus on energy consumption and seek energy efficiency advantages in circuit simulation that surpass quantum processors, aligning with global commitments to sustainability and environmental protection. The research proposes system-level techniques, including: 1. Three-level strategies that make full use of distributed memory and improve energy efficiency. 2. Hybrid communication strategies that maximize intra-node bandwidth utilization, reduce inter-node data transfers, and adopt low-precision quantization to reduce data volume. 3. Einsum expansion for complex half-precision data to reduce memory requirements and utilize high-speed fp16 tensor core computation. These techniques enable researchers to surpass the Sycamore quantum processor in terms of time and energy consumption, showcasing the potential of supercomputers in tackling complex problems and driving the development of green computing frameworks.