An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays

Junyan Qian,Kunzhu Qiu,Hao Ding,Huimin Zhang,Zhongyi Zhai
DOI: https://doi.org/10.1109/TPDS.2023.3339961
IF: 5.3
2024-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:With the ever-increasing integration and parallel computing capabilities of 3D processor arrays, the occurrence of processor elements (PEs) failures caused by various factors has become more prevalent. Therefore, the implementation of a fault-tolerant mechanism that uses the remaining fault-free PEs to reconfigure sub-array becomes critical. In this paper, we study the problem of reconfiguring a 3D subarray with as many fault-free PEs as possible, which has been shown to be NP-complete in previous work. Although prior algorithms have been effective under low fault densities, they are severely limited when faced with high fault densities. To address this, we first define the bottleneck of the 3D processor array, proposed a novel method to identify the physical bottleneck plane that restricts the reconfigurable size of the logical sub-array and prove its correctness. Then, we propose an effective compensation strategy that can fully utilize the fault-free PEs in the bottleneck plane. Under this strategy, a sliding-window weight calculation method is proposed to determine the priority of compensation. Finally, we proposed a heuristic algorithm, which can construct the maximum target array from different dimensions in polynomial time. Experimental results demonstrate that the proposed algorithm exhibits favorable performance in terms of harvest and degradation. For the random-failure model, the improvement in the harvest for fault-free PEs is up to 32.03% on a 32x32x3232x32x32 host array with a 20% fault density. And for the clustered fault model, the improvement in harvest is up to 70.63% on a 32x32x3232x32x32 host array distributed with 12 cluster failures of size 6x6x66x6x6.
What problem does this paper attempt to address?