What is it about?
Modern processors guess what code will do next so they don't have to wait. This feature is called branch prediction. When chip designers add more pipeline stages to a processor to make it run faster, the predictor's record of recent branches falls behind, so it guesses wrong more often. The faster clock would have made up for some of this, but our measurements show that a deeper pipeline ends up worse off on certain workloads. We built a small hardware fix called Speculative GHR Forwarding that keeps the predictor's view of branch history up to date as it travels through the pipeline. On a Xilinx FPGA running standard benchmarks like CoreMark, our fix cuts wrong guesses by 31 percent at less than 1 percent extra hardware cost and no loss in clock speed.
Featured Image
Photo by Slejven Djurakovic on Unsplash
Why is it important?
Engineers building FPGA-based processors often need to add pipeline stages to hit higher clock speeds. They then discover the deeper design predicts branches worse than the shallower one, and they live with the loss. We explain exactly why this happens, measure how large the penalty is across five pipeline depths, and provide a small hardware addition that removes the penalty completely. Anyone building an FPGA soft processor with three or more stages of branch-resolution delay can drop in our fix and recover the lost accuracy at negligible cost.
Perspectives
This work matters to anyone building custom processors on FPGAs, including soft cores used in research, education, embedded systems, and as front-ends for hardware accelerators. It also offers a measurement lesson: the depth-dependent misprediction penalty is a separate, fixable problem from the inherent flush penalty, and they should be reported separately.
Devansh Joshi
Read the Original
This page is a summary of: Speculative GHR Forwarding, May 2026, Open Engineering Inc,
DOI: 10.31224/7181.
You can read the full text:
Contributors
The following have contributed to this page







