In-Place Zero-Space Memory Protection for CNN

Hui Guan,Lin Ning,Zhen Lin,Xipeng Shen,Huiyang Zhou,Seung-Hwan Lim
DOI: https://doi.org/10.48550/arXiv.1910.14479
2019-10-31
Abstract:Convolutional Neural Networks (CNN) are being actively explored for safety-critical applications such as autonomous vehicles and aerospace, where it is essential to ensure the reliability of inference results in the presence of possible memory faults. Traditional methods such as error correction codes (ECC) and Triple Modular Redundancy (TMR) are CNN-oblivious and incur substantial memory overhead and energy cost. This paper introduces in-place zero-space ECC assisted with a new training scheme weight distribution-oriented training. The new method provides the first known zero space cost memory protection for CNNs without compromising the reliability offered by traditional ECC.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the tolerance to memory failures and ensure the reliability of inference results without increasing additional space costs when convolutional neural networks (CNNs) are applied to safety - critical fields (such as self - driving vehicles and aerospace). Traditional memory protection methods, such as error - correction code (ECC) and triple - modular redundancy (TMR), can provide certain protection, but they increase a large amount of memory overhead and energy consumption, which is a significant burden for resource - constrained devices (such as mobile devices). Therefore, the paper proposes a new method - in - place zero - space ECC, combined with a new training scheme - Weight Distribution - Oriented Training (WOT), to achieve memory protection with zero - space cost while maintaining the same reliability as traditional ECC. Specifically, the paper solves the problem in the following ways: 1. **Utilize the characteristics of CNN weights**: It is observed that the weights of a well - trained CNN are mostly small values, and the absolute value of most weights is less than 64. This means that if 8 bits are used to represent each weight, in fact, only 7 bits are needed to represent these small values, and the remaining 1 bit can be used for other purposes, such as error correction. 2. **Introduce the WOT training scheme**: Through the WOT training scheme, the distribution of weights is adjusted so that the values in the first 7 positions in each 64 - bit data block fall within the range of [- 64, 63], so that the non - information bits in these 7 positions can be used to store ECC check bits. 3. **Design an in - place ECC protection mechanism**: By embedding ECC check bits into the data, additional space overhead is avoided. This method not only reduces space costs but also maintains the same error - correction ability as traditional ECC. The experimental results show that this method can effectively reduce the impact of memory failures on inference results on different CNN models, and its performance is comparable to that of the traditional ECC method under various failure rate settings, but without additional space overhead. This provides a new solution for improving the reliability and energy efficiency of CNNs in safety - critical applications.