What is it about?

Soft errors, or transient bit flips, can cause unexpected behaviors in software programs. Traditional methods against soft errors incur large runtime or memory overheads, making them inapplicable to neural network applications which can have strict resource constraints. This paper introduces an efficient protection technique that can correct any single fault in a convolutional neural network, regardless of the location and timing of the fault.

Featured Image

Why is it important?

As neural networks are deployed even in safety-critical applications, malfunctions in the networks can lead to catestrophic consequences. Therefore, we apply an ABFT method combined with the idea of Hamming codes to correct faults in the weights and biases within a layer in the network. We also add a carefully crafted duplication-based roll-back recovery for faults in the intermediate inputs and outputs between the layers. This allows us to achieve near perfect fault coverage with 27% less runtime overhead and minimal memory overhead compared to traditional TMR-based methods.

Read the Original

This page is a summary of: Maintaining Sanity: Algorithm-based Comprehensive Fault Tolerance for CNNs, June 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3649329.3657355.
You can read the full text:

Read

Contributors

The following have contributed to this page