What is it about?
Fault injection in early microarchitecture-level simulation CPU models and beam experiments on the final physical CPU chip are two established methodologies to access the soft error reliability of a microprocessor at different stages of its design flow. Beam experiments, on one hand, estimate the devices expected soft error rate in realistic physical conditions by exposing it to accelerated particles fluxes. Fault injection in microarchitectural models of the processor, on the other hand, provides deep insights on faults propagation through the entire system stack, including the operating system. Combining beam experiments and fault injection data can deliver deep insights about the devices expected reliability when deployed in the field. However, it is yet largely unclear if the fault injection error rates can be compared to those reported by beam experiments and how this comparison can lead to informed soft error protection decisions in early stages of the system design. In this paper, we present and analyze data gathered with extensive beam experiments (on physical CPU hardware) and microarchitectural fault injections (on an equivalent CPU model on Gem5) performed with 13 different benchmarks executed on top of Linux on an ARM Cortex-A9 microprocessor. We combine experimental data that cover more than 2.9 million years of natural exposure with the result of more than 80,000 injections. We then compare the soft error rate estimations that are based on neutron beam and fault injection experiments. We show that, for most benchmarks, fault injection can be very accurately used to predict the Silent Data Corruptions (SDCs) rate and the Application Crash rate. The System Crash rate measured with beam experiments, however is much larger than the one estimated by fault injection due to unknown proprietary parts of the physical hardware platform that can't be modeled in the simulator. Overall, our analysis shows that the relative difference between the total error rates of the beam experiments and the fault injection experiments is limited within a narrow range of values and is always smaller than one order of magnitude. This narrow range of the expected failure rate of the CPU provides invaluable assistance to the designers in making effective soft error protection decisions in early design stages.
Featured Image
Photo by S Sjöberg on Unsplash
Read the Original
This page is a summary of: Demystifying Soft Error Assessment Strategies on ARM CPUs: Microarchitectural Fault Injection vs. Neutron Beam Experiments, June 2019, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/dsn.2019.00018.
You can read the full text:
Contributors
The following have contributed to this page







