What is it about?

In this paper, we revisit the system vulnerability stack for transient faults. We reveal severe pitfalls in widely used vulnerability measurement approaches, which separate the hardware and the software layers. We rely on microarchitecture level fault injection to derive very tight full-system vulnerability measurements. For our architectural and microarchitectural measurements, we employ GeFIN, a state-of-the-art fault injector built on top of the gem5 simulator, while for software level measurements we employ the LLFI fault injector. Analyzing two different Arm ISAs and two different microarchitectures for each ISA, we quantify the sources and the magnitude of error of architecture and software level vulnerability evaluation methods, which aim to reproduce the effects of hardware faults. We show that widely applied methodologies for system resilience evaluation fail to capture important fault manifestation and propagation aspects and lead to misleading findings, which report opposite vulnerability results than a comprehensive cross-layer analysis. To justify the validity of our findings we employ a state-of-the-art software-based fault tolerance technique and evaluate its impact at all layers through a case study. Our evaluation shows that although higher-level methods can report significant vulnerability improvements (up to 3.8x vulnerability reduction), the actual cross-layer vulnerability of the protected system can be degraded (increased) by up to 30% for the selected benchmarks. Our analysis firmly suggests that only accurate methodologies for full-system vulnerability evaluation of a microprocessor can guide informed transient faults protection decisions either at the hardware or at the software layer.

Featured Image

Why is it important?

We demystify, in the finest possible granularity, the impact of hardware faults all the way to the applications output. We challenge the validity of fundamental assumptions on which multiple recent studies are based and simplistically employ software or architecture level injection to assess the effects of hardware faults and the effectiveness of fault tolerance schemes.

Read the Original

This page is a summary of: Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers, June 2021, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/isca52012.2021.00075.
You can read the full text:

Read

Contributors

The following have contributed to this page