CLIM: A Cross-Level Workload-Aware Timing Error Prediction Model for Functional Units

Xun Jiao,Abbas Rahimi,Yu Jiang,Jianguo Wang,Hamed Fatemi,Jose Pineda de Gyvez,Rajesh K. Gupta
DOI: https://doi.org/10.1109/tc.2017.2783333
IF: 3.183
2018-01-01
IEEE Transactions on Computers
Abstract:Timing errors that are caused by the timing violations of sensitized circuit paths, have emerged as an important threat to the reliability of synchronous digital circuits. To protect circuits from these timing errors, designers typically use a conservative timing margin, which leads to operational inefficiency. Existing adaptive approaches reduce such conservative margins by predicting the timing errors in advance and adjusting the timing margin adaptively. However, these error prediction approaches overlook the impact of input workload (i.e., operands) on path sensitization, thereby resulting in a loss of accuracy. The diversity of input operands leads to complex path sensitization behaviors, making them hard to represent in timing error modeling. In this paper, we propose clam, a cross-level workload-aware timing error prediction model for functional units (FUs). CLIM predicts whether there are timing errors in FU at two levels: bit-level and value-level. At the bit level or value level, CLIM predicts each output bit or entire output value as one of two classes: {timing correct, timing erroneous} as a function of input workload and clock period, respectively. We apply supervised learning methods to construct CLIM, by using input operands, computation history and circuit toggling as input features, as well as outputs' timing classes as labels. These training data are collected from gate-level simulations (GLS) of post place-and-route designs in TSMC 45nm process. We evaluate CLIM prediction accuracy for various FUs and compare it with baseline models. On average, CLIM exhibits 95 percent prediction accuracy at value-level, 97 percent at bit-level, and executes at a rate 173X faster than GLS. We utilize CLIM to analyze the value-level and bit-level reliability of FUs under random and real-world application workloads. At value-level, CLIM-based reliability estimation is within 2.8 percent deviation on average of detailed GLS ground truth. At bit-level, we introduce the concept of bit-level reliability specification of error-tolerant applications and compare this with the CLIM-based bit-level reliability estimation. By comparison, CLIM will classify the application quality into two classes: {acceptable, non-acceptable). On average, 97 percent application quality classification is consistent with GLS ground truth.
What problem does this paper attempt to address?