16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems.

Yihong Zhu,Wenping Zhu,Yi Ouyang,Junwen Sun,Min Zhu,Qi Zhao,Jinjiang Yang,Chen Chen,Qichao Tao,Guang Yang,Aoyang Zhang,Shaojun Wei,Leibo Liu
DOI: https://doi.org/10.1109/ISSCC49657.2024.10454332
2024-01-01
Abstract:The migration towards post-quantum cryptography (PQC) is in progress to secure communications and transactions against the impending quantum threat, while three key-encapsulation mechanisms (KEM) and one digital signature (DS) scheme are being standardized by NIST [1]. This multi-year migration poses serious challenges to PQC implementations for compatibility and performance requirements in various scenarios and settings: 1) Performance limitations caused by diverse computation patterns and relatively higher computation costs. 2) Crypto-agility resulting from different mathematical problems, various security parameters, or even different standardization bodies. 3.) Domain-specific optimization to address the long-term security of the evolving algorithm families. However, most existing PQC accelerators [2, 3, 5] were only customized for specific algorithms based on unique mathematical problems. The latest configurable PQC processor [4] did not support Falcon and Sphincs+, which are being drafted for standardization. To address this issue, a versatile PQC processor fabricated in 28nm is presented with three key features: 1) task-clustering-based architecture for scalable processing with aggressive parallelism; 2) region-based task-path (TP) with dynamic update for agile cryptographic computing; 3) efficient PQC task-operators (TO), including hash/sample, format, floating-point/complex, encoding/decoding operators, for further improvements on throughput and energy-efficiency. Based on these contributions, the proposed chip supports all predominant schemes in NIST’s PQC standardization, while still delivering 44.6% and 10.3% improvements in the throughput and energy-delay product (EDP), respectively, relative to a state-of-the-art design [4].
What problem does this paper attempt to address?