DeFT: A Deadlock-Free and Fault-Tolerant Routing Algorithm for 2.5D Chiplet Networks

Ebadollah Taheri,Sudeep Pasricha,Mahdi Nikdast
DOI: https://doi.org/10.48550/arXiv.2112.09234
2021-12-17
Abstract:By interconnecting smaller chiplets through an interposer, 2.5D integration offers a cost-effective and high-yield solution to implement large-scale modular systems. Nevertheless, the underlying network is prone to deadlock, despite deadlock-free chiplets, and to different faults on the vertical links used for connecting the chiplets to the interposer. Unfortunately, existing fault-tolerant routing techniques proposed for 2D and 3D on-chip networks cannot be applied to chiplet networks. To address these problems, this paper presents the first deadlock-free and fault-tolerant routing algorithm, called DeFT, for 2.5D integrated chiplet systems. DeFT improves the redundancy in vertical-link selection to tolerate faults in vertical links while considering network congestion. Moreover, DeFT can tolerate different vertical-link-fault scenarios while accounting for vertical-link utilization. Compared to the state-of-the-art routing algorithms in 2.5D chiplet systems, our simulation results show that DeFT improves network reachability by up to 75% with a fault rate of up to 25% and reduces the network latency by up to 40% for multi-application execution scenarios with less than 2% area overhead.
Emerging Technologies,Hardware Architecture
What problem does this paper attempt to address?
This paper attempts to solve two main problems in 2.5D integrated chip systems: deadlock avoidance and low reliability (due to vertical link failures). Specifically: 1. **Deadlock Avoidance**: The networks in 2.5D chip systems are prone to deadlocks, even if the networks within a single chip are deadlock - free. This is because when multiple chips are connected through an interposer, cross - chip data packets may form circular dependencies, leading to deadlocks. 2. **Low Reliability**: The vertical links (VLs) in 2.5D chip systems are prone to failures, which will affect the reliability and performance of the system. Existing fault - tolerant routing techniques cannot be directly applied to 2.5D chip systems because these systems have higher irregularity and require higher path redundancy to support fault - tolerant routing. To solve these problems, the paper proposes the first deadlock - free and fault - tolerant routing algorithm - DeFT (Deadlock - Free and Fault - Tolerant). The main contributions of DeFT include: - **Deadlock - free Routing**: By using a virtual network (VN) allocation strategy, ensure that the utilization of virtual channels (VCs) in the network is highly balanced, thereby achieving deadlock - free routing. - **Fault - Tolerant Routing**: Propose a new dynamic vertical link selection strategy, which not only improves fault - tolerance, but also optimizes load distribution and reduces network latency. ### Main Technical Details #### 1. Virtual Network Separation and Deadlock - free Routing DeFT uses two virtual networks (VN.0 and VN.1), and each virtual network requires at least one virtual channel (VC). The specific rules are as follows: - **Rule 1**: Routing from VN.1 to VN.0 is prohibited, but routing from VN.0 to VN.1 is allowed. - **Rule 2**: For data packets in VN.0, routing from the up port to the horizontal ports is prohibited. - **Rule 3**: For data packets in VN.1, routing from the horizontal ports to the down port is prohibited. These rules ensure that the routing of data packets in different virtual networks does not form circular dependencies, thereby avoiding deadlocks. #### 2. Fault - Tolerant and Congestion - Aware Vertical Link Selection DeFT also proposes a dynamic selection strategy that takes into account vertical link failures and network congestion. The specific steps include: - **Offline Analysis**: In the design stage, analyze the optimal selections under different vertical link failure scenarios and store the results in the router's lookup table. - **Online Selection**: At runtime, select the best vertical link from the pre - analyzed options according to the current failure situation and network traffic situation. Through this strategy, DeFT can maintain high reachability in the case of vertical link failures and reduce network latency in high - traffic scenarios. ### Experimental Results The paper verifies the performance advantages of DeFT through simulation: - **Latency Analysis**: In the synthetic traffic scenario, DeFT has the lowest average latency, especially in local traffic and hot - spot traffic scenarios. - **Fault - Tolerance Analysis**: In the vertical link failure scenario, DeFT can achieve up to 100% reachability, while other existing algorithms can only tolerate very few failures at the same failure rate. - **Actual Application Traffic**: Under actual application traffic, DeFT shows significant latency improvement when multiple applications are running simultaneously, up to 40%. In conclusion, through the innovative virtual network separation and dynamic vertical link selection strategies, DeFT successfully solves the deadlock and vertical link failure problems in 2.5D chip systems and significantly improves the reliability and performance of the system.