PQPU: A 4.4- $\mu$ J/Op 69.4-Kops Agile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems
Yihong Zhu,Wenping Zhu,Yi Ouyang,Junwen Sun,Qi Zhao,Min Zhu,Jinjiang Yang,Chen,Qichao Tao,Hanning Wang,Guang Yang,Shaojun Wei,Aoyang Zhang,Leibo Liu
DOI: https://doi.org/10.1109/jssc.2024.3476949
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:Post-quantum cryptography (PQC) is currently being standardized to replace the existing public-key cryptography for data security in the era of quantum computing. PQC algorithms exhibit considerable diversity in their underlying mathematical problems, storage requirements, and computational patterns, thus complicating unified PQC architecture design. To address this issue, a unified PQC domain-specific accelerator (DSA), post-quantum processing unit (PQPU), is proposed to address the trade-off between performance and flexibility at the algorithm, architecture, and circuit levels. First, a task-clustering-based framework is proposed to enable task-level parallel execution by utilizing the inherent parallelism and common functions across different PQC algorithms. Second, a region-based dynamically updated task path (TP) is constructed to facilitate automatic task-dependency management, with agile control flow and minimized overheads. Finally, algorithm-hardware co-optimizations are proposed in each task cluster to improve throughput and energy efficiency. Fabricated in a 28-nm process, PQPU has energy efficiency and throughput of 4.4 mu J/Op and 69.4 kOPS. The energy-delay product (EDP) and throughput achieved are 19.3% and 44.6% compared to the state-of-the-art design, respectively. To the best of our knowledge, PQPU is the first silicon-proven PQC accelerator that supports all valid schemes in NIST PQC standardization.