Abstract:QUIC is a new protocol standardized in 2021 designed to improve on the widely used TCP / TLS stack. The main goal is to speed up web traffic via HTTP, but it is also used in other areas like tunneling. Based on UDP, it offers features like reliable in-order delivery, flow and congestion control, stream-based multiplexing, and always-on encryption using TLS 1.3. Unlike TCP, QUIC integrates these capabilities in user space, relying on kernel interaction solely for UDP. Operating in user space allows more flexibility but sacrifices some kernel-level efficiency and optimization that TCP benefits from. Various QUIC implementations exist, each distinct in programming language, architecture, and design. QUIC is already widely deployed on the Internet and has been evaluated, focussing on low latency, interoperability, and standard compliance. However, benchmarks on high-speed network links are still scarce. This paper presents an extension to the QUIC Interop Runner, a framework for testing the interoperability of QUIC implementations. Our contribution enables reproducible QUIC benchmarks on dedicated hardware and high-speed links. We provide results on 10G links, including multiple implementations, evaluate how OS features like buffer sizes and NIC offloading impact QUIC performance, and show which data rates can be achieved with QUIC compared to TCP. Moreover, we analyze different CPUs and CPU architectures influence reproducible and comparable performance measurements. Furthermore, our framework can be applied to evaluate the effects of future improvements to the protocol or the OS. Our results show that QUIC performance varies widely between client and server implementations from around 50 Mbit/s to over 6000 Mbit/s. We show that the OS generally sets the default buffer size too small. Based on our findings, the buffer size should be increased by at least an order of magnitude. Our profiling analysis identifies Packet I/O as the most expensive task for QUIC implementations. Furthermore, QUIC benefits less from AES NI hardware acceleration while both features improve the goodput of TCP to around 8000 Mbit/s. The lack of support for NIC offloading from QUIC implementations results in missed opportunities for performance improvement. The assessment of CPUs from different vendors and generations revealed significant performance variations. We employed core pinning to examine if the performance of QUIC implementations is affected by the allocation to specific CPU cores. The results indicated an increased goodput of up to 20% when running on a specifically chosen core compared to a randomly assigned core. This outcome highlights the impact of CPU core selection on the performance of QUIC implementations but also for reproducible measurements.

QTLS - high-performance TLS asynchronous offload framework with Intel® QuickAssist technology.

Benchmarking Post-quantum Cryptography in TLS

Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara Vswitch in Alibaba Cloud

Faster Post-Quantum TLS 1.3 Based on ML-KEM: Implementation and Assessment

HyQ: Hybrid I/O Queue Architecture for NVMe over Fabrics to Enable High- Performance Hardware Offloading

Performance Characterization of SmartNIC NVMe-over-Fabrics Target Offloading

On the Energy Costs of Post-Quantum KEMs in TLS-based Low-Power Secure IoT

Enhanced Performance for the encrypted Web through TLS Resumption across Hostnames

QFaaS: accelerating and securing serverless cloud networks with QUIC

QUIC on the fast lane: Extending performance evaluations on high-rate links

A Quantum of QUIC: Dissecting Cryptography with Post-Quantum Insights

ISA-Based Trusted Network Functions And Server Applications In The Untrusted Cloud

Trinity: A General Purpose FHE Accelerator

iTLS: Lightweight Transport-Layer Security Protocol for IoT With Minimal Latency and Perfect Forward Secrecy

Position-aware Thread-Level Speculative Parallelization for Large-Scale Chip-Multiprocessor.

PQPU: A 4.4- $\mu$ J/Op 69.4-Kops Agile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems

Providing Bandwidth Guarantees, Work Conservation and Low Latency Simultaneously in the Cloud

Arcus: SLO Management for Accelerators in the Cloud with Traffic Shaping

mdTLS: How to make middlebox-aware TLS more efficient?

Assessing the overhead of post-quantum cryptography in TLS 1.3 and SSH

CCxTrust: Confidential Computing Platform Based on TEE and TPM Collaborative Trust