Abstract:Emerging network-attached resource disaggregation architecture requires ultra-low latency rack-scale communication. However, current hardware offloading (e.g., RDMA) and user-space (e.g., mTCP) communication schemes still rely on heavily layered protocol stacks which requires the translation between PCIe bus and network protocol, or complex connection/memory resource management within RNICs, inevitably bringing latency overhead. We argue that PCIe Non-Transparent Bridge (NTB) is a superior high-speed in-rack network technology to interconnect PCIe-attached machines or devices with the same PCIe fabric since no translation is needed between PCIe and network protocol. We present NTSocks, the first user-space in-rack interconnect over PCIe fabric which virtualizes native NTB into high-level network functionalities for rack-scale systems with software-hardware co-design. NTSocks provides (1) compatibility with a fast socket-like abstraction, (2) multi-thread scalability using a core-driven dat-aplane model, and (3) fair and efficient resource sharing with a multi-tenant isolation mechanism. Even though PCIe NTB is originally designed for device communication across PCIe domains, NTSocks shows a flexible user-level indirection with performance close to bare-metal NTB while providing common network stack features. In the evaluations with latency-sensitive Key-Value Store, NTSocks achieves better latency by up to 24.5× and 1.58× than kernel and RDMA socket, respectively.

Transaction-Aware Network-on-Chip Resource Reservation

Latency Criticality Aware On-Chip Communication

Extending On-Chip Interconnects for Rack-Level Remote Resource Access

Qswitch: Dynamical Off-Chip Bandwidth Allocation Between Local and Remote Accesses.

Run-time Accelerate Channel for Communication-Aware Network-on-Chip

AlNiCo: SmartNIC-accelerated Contention-aware Request Scheduling for Transaction Processing

Bandwidth Allocation Approach for Network-on-Chip Based on Contention Probability

Self-optimizing Two-layer Network-on-Chip Based on Dominant Network-Flow Adaption

A Low-Cost and High-Throughput NoC-Aware Chip-to-Chip Interconnection

An Ultra-Low Latency and Compatible PCIe Interconnect for Rack-Scale Communication.

A Low-Latency and Low-Power Hybrid Scheme for On-Chip Networks

A Tightly Coupled Network-on-Chip Router Architecture

Adaptive Congestion Control for Application Specific Networks-On-Chip

Work in Progress: ACAC: An Adaptive Congestion-aware Approximate Communication Mechanism for Network-on-Chip Systems

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

Design Trade-Offs in Packetizing Mechanism for Network-on-Chip

TrafficLite: A Configurable On-Chip Interconnect Router Microarchitecture

A Low Latency Variance NoC Router

An Efficient Scheduling Mechanism With Flow-Based Packet Reordering In A High-Speed Network Processor

A Power-Efficient Network-On-Chip for Multi-Core Stream Processors