Question

Assume a five-stage single-pipeline microarchitecture (fetch,
decode, execute, memory, write- back) and the code given below. All
ops are one cycle except LW and SW, which are 1+2 cycles, and
branches, which are 1+1 cycles. There is no forwarding. Show the
phases of each instruction per clock cycle for one iteration of the
loop.

Loop: lw x1,0(x2)

addi x1,x1, 1

sw x1,0(x2)

addi x2,x2,4

sub x4,x3,x2

bnz x4,Loop

- How many clock cycles per loop iteration are lost to branch overhead?
- Assume a static branch predictor, capable of recognizing a backward branch in the Decode stage. Now how many clock cycles are wasted on branch overhead?
- Assume a dynamic branch predictor. How many cycles are lost on a correct prediction?

Answer #1

