Pipelining is performing instructions concurrently to enhance the throughput of the program. You have learnt the MIPS (Microprocessor without Interlocked Pipeline Stages) processor pipelining which contains five stages (IF, ID, EXE, MEM, and WB). Propose a three-stage pipelining (Fetch, Decode, Execute) and propose the changes in your proposed scheme and the differences between it and the original five stages pipelining. Draw the three stages illustrating the functions of each stage.
Succession of directions:
1. lw $s2, 0($s1)
2. lw $s1, 40($s6)
3. sub $s6, $s1, $s2
4. include $s6, $s2, $s2
5. then again $s3, $s6, $zero
6. sw $s6, 50($s1)
1. Information conditions: An information reliance is a reliance of one direction B on another
direction An on the grounds that the esteem created by An is perused by B. So when posting information
conditions, you have to say A, B, and the area (for this situation, the enroll
that causes the reliance). It likewise doesn't make a difference how far separated conditions are.
Try not to confound information conditions and risks!
Conditions:
3 relies on upon 1 ($s2)
3 relies on upon 2 ($s1)
4 relies on upon 1 ($s2)
5 relies on upon 4 ($s6)
6 relies on upon 2 ($s1)
6 relies on upon 4 ($s6)
2. Accept the 5-organize MIPS pipeline with no sending, and each stage takes 1 cycle.
Rather than embeddings nops, you let the processor slow down on dangers. How often does
the processor slow down? To what extent is each slow down (in cycles)? What is the execution time (in
cycles) for the entire program?
Overlooking the slows down for a minute, the program takes 10 cycles to execute – not 6, on the grounds that
the initial 4 cycles, it doesn't confer (complete) a guideline – those 4 cycles, the pipeline
is as yet topping off. It is additionally not 30, since when the primary direction submits, the second
guideline is almost done, and will confer in the following cycle. Keep in mind, pipelines
permit different guidelines to execute in the meantime.
With the slows down, there are just two slows down – after the second load, and after the include – both
are on account of the following direction needs the esteem being delivered. Without sending,
this implies the following direction will be stuck in the get organize until the past
direction composes back. These are 2 cycle slows down (if all else fails, draw outlines like
the ones on the Implementing MIPS slides, slide 63). So to answer the question, 2
slows down, 2 cycles each, and the aggregate is 10 + 2 ∗ 2 = 14 cycles to execute the program.
3. Expect the 5-arrange MIPS pipeline with full sending. Compose the program with nops
to dispose of the perils. (Imply: time travel is impractical!)
Once more, draw an outline – in the second slow down, the consequence of the include is accessible at the
enroll toward the finish of the execute organize when the following guideline needs to move to
the execute arrange, so you can forward the estimation of $s6 as the contribution to the execution
arrange (as the contention for the or) – this evacuates the second 2-cycle slow down. In any case,
the stacked esteem is not prepared until the finish of the memory arrange, so you can't utilizing
sending to evacuate both cycles – despite everything you have to hold up 1 cycle. The arrangement is to
put a NOP after the second load. (Take note of: this is likewise the motivation behind why MIPS has stack
defer spaces).
Discussing defer openings, the question was uncertain in whether postpone spaces were utilized or
not. On the off chance that they are, every one of the appropriate responses are somewhat unique:
Section 1: You have one less reliance - 3 does not rely on upon 2, since it is in the
defer opening.
Section 2: The primary slow down is just 1 cycle, so the program executes in 13 cycles.
Section 3: You don't require any nops, on account of the postpone opening.
Address 4: More Pipelines
You are given a non-pipelined processor plan which has a process duration of 10ns and normal
CPI of 1.4. Compute the inactivity speedup in the accompanying inquiries.
The arrangements given expect the base CPI = 1.4 throughput. Since the question is uncertain,
you could expect pipelining changes the CPI to 1. The strategy for registering the
answers still apply.
1. What is the best speedup you can get by pipelining it into 5 phases?
Since IC and CPI don't change, and, in the best case, pipelining will lessen CT to 2ns:
Speedup =
CTold
CTnew
=
10ns
2ns
= 5x Speedup
2. On the off chance that the 5 phases are 1ns, 1.5ns, 4ns, 3ns, and 0.5ns, what is the best speedup you can
get contrasted with the first processor?
The process duration is restricted by the slowest arrange, so CT = 4 ns.
Speedup =
CTold
CTnew
=
10ns
4ns
= 2.5x Speedup
3. In the event that every pipeline arrange added additionally adds 20ps because of enlist setup delay, what is the best
speedup you can get contrasted with the first processor?
Adding register postponement to the process duration in light of pipeline registers, you get CT =
4.02 ns.
Speedup =
CTold
CTnew
=
10ns
4.02ns
= 2.49x Speedup
4. The pipeline from Q4.3 slows down 20% of the ideal opportunity for 1 cycle and 5% of the ideal opportunity for 2
cycles (these occurences are disjoint). What is the new CPI? What is the speedup
contrasted with the first processor?
20% of the time: CP I = 1.4 + 1 = 2.4
5% of the time: CP I = 1.4 + 2 = 3.4
75% of the time: CP I = 1.4
New normal CPI: 2.4 ∗ 0.2 + 3.4 ∗ 0.05 + 1.4 ∗ 0.75 = 1.7
Speedup =
CTold ∗ CP Iold
CTnew ∗ CP Inew
=
10 ∗ 1.4
4.02 ∗ 1.7
= 2.049x Speedup
Get Answers For Free
Most questions answered within 1 hours.