## **6.004 Tutorial Problems**

# L12 – Introduction to Pipelining

**Latency**: the delay from when an input is established until the output associated with the input becomes valid.

- Combinational circuits: L = t<sub>PD</sub>
- K-pipeline:  $L = K^* t_{CLK}$

**Throughput**: the rate at which inputs or outputs are processed.

- Combinational circuits: T = 1/L
- K-pipeline:  $T = 1/t_{CLK}$



Unpipelined: L = 45ns, T = 1/L = 1/(45ns)2-stage pipeline [ $t_{CLK}=25ns$ ]: L = 2\*25 = 50 ns, T = 1/(25ns)

|                 |   | Clock o | cycle —            |                                              |                      |                      |  |
|-----------------|---|---------|--------------------|----------------------------------------------|----------------------|----------------------|--|
|                 |   |         | i                  | i+1                                          | i+2                  | i+3                  |  |
|                 |   |         |                    | L                                            |                      |                      |  |
| Pipeline stages | - | F & G   | $F(X_i)$           | F(X <sub>i+1</sub> )<br>G(X <sub>i+1</sub> ) | F(X <sub>i+2</sub> ) |                      |  |
|                 |   |         | G(X <sub>i</sub> ) | G(X <sub>i+1</sub> )                         | G(X <sub>i+2</sub> ) | •••                  |  |
|                 |   | Н       |                    | H(X <sub>i</sub> )                           | H(X <sub>i+1</sub> ) | H(X <sub>i+2</sub> ) |  |

#### **Pipelining methodology**:

- Form 1-pipeline by adding registers to all outputs
- To add a pipeline stage, draw contour across all paths from inputs to outputs such that it doesn't cross other contours and all inputoutput paths cross the contour in the same direction. This ensures the pipeline is wellformed (same # of registers on all inputoutput paths). A K-pipeline has K registers on all input-output paths.
- Contours must take into account pipelined components.





Pipelining trades improved Clock Period and Throughput for worse Latency. Consider the example shown below:

Problem 1. ★

A simple combinational circuit is to be pipelined for **maximum throughput using a minimal number of registers**. For each of the questions below, please create a valid K-stage pipeline. **Show your pipelining contours** and place large black circles (•) on the signal arrows to **indicate the placement of ideal pipeline registers** (tPD=0, tSETUP=0). Give the latency and throughput for each design. Remember that our convention is to place a pipeline register on each output.



L12– Introduction to Pipelining

## Problem 2.

The following 1-stage pipelined circuit computes Z from the four inputs A, B, C, and D. Each component is annotated with its propagation delay in ns.



(A) Please pipeline the circuit above for maximum throughput with the minimum possible latency using ideal pipeline registers ( $t_{PD} = 0$ ,  $t_{SETUP} = 0$ ). Show the location of pipeline registers in the diagram above using filled-in circles, like the one shown on the Z output. Please give the latency and throughput of the resulting pipelined circuit.

Latency (ns)?  $5 \times 3 = 15 \text{ ss}$ Throughput (1/ns)?  $\frac{1}{3}$ 

(B) Now suppose the "3" component is replaced by a 2-stage pipelined component with a minimum t<sub>CLK</sub> of 1.5ns. Again, please pipeline the circuit below for maximum throughput with the minimum possible latency using ideal pipeline registers. Show the location of pipeline registers in the diagram below using filled-in circles, like the one shown on the Z output. Please give the latency and throughput of the resulting pipelined circuit.



## Problem 3.

An unidentified government agency has a design for a combinational device depicted below:



analyze and improve the performance of this device.

1

(A) (1 Point) What are the throughput and latency for the unpipelined combinational device?

 $\frac{3 \times 60 \text{ ns}}{(30 \times 10^{-1} \text{ s})}$ (B) (4 Points) Show how to pipeline the above circuit for maximum throughput, by marking loca-

(B) (4 Points) Show how to pipeline the above circuit for maximum throughput, by marking locations in the diagram where registers are to be inserted. Use a minimum number of registers, but be sure to include one on the output. Assume that the registers have 0 t<sub>PD</sub> and t<sub>SETUP</sub>.

10

(mark register locations in diagram above)

(C) (1 Point) What are the latency and throughput of your pipelined circuit?

Latency: 130 ns; Throughput: 1/60 ns<sup>-1</sup>

## Problem 4. ★

The following circuit uses six full adder modules (as you've seen in lecture and lab) arranged in a combinational circuit that computes a 3-bit value F=A+B+5 for 3-bit inputs A and B:



The full adders have a  $t_{PD}$  of 6ns.

(A) Give the latency and throughput of the combinational circuit.

(1-pipeline)

6 × 4 = 24 ns

Latency: <u>2Y</u>ns; Throughput: <u>/24</u>

(B) Indicate, on the above diagram, appropriate locations to place ideal (zero-delay) registers to pipeline the circuit for maximum throughput using a minimum number of registers. Be sure to include a register on each output. 19

(mark circuit above)

(C) Give the latency and throughput of your pipelined circuit.



Latency: <u>29</u>ns; Throughput: <u>6</u>

## Problem 5. ★

(A) You are provided with the circuit shown below. Each box represents some combinational logic. The number in each box is the t<sub>PD</sub> of that combinational logic. The circuit has two inputs, X and Y, and one output Out. Pay close attention to the direction of the arrows especially the arrows shown in **bold**. What is the latency and throughput of this combinational circuit?



(B) Draw contours through the circuit above to produce a valid pipelined circuit whose  $t_{CLK} =$  **9ns with minimum latency.** Extra copies of the diagram are included below. Please use a large dot to **indicate the location of each pipeline register**. Assume that you have ideal pipeline registers ( $t_{PD}=t_{CD}=t_{Setup}=t_{Hold}=0$  ns). Pay close attention to the direction of each arrow to ensure that you produce a valid pipeline. What is the latency and throughput of this pipelined circuit?



(C) You are now asked to consider the performance of this circuit using different clock periods while achieving the minimum latency. For each suggested  $t_{CLK}$ , specify whether or not you can create a **valid** pipelined circuit using that clock period. If you can, then provide the latency and throughput of the resulting circuit and specify the number of registers at each input. If it results in an invalid pipeline, enter NA for the rest of the row.

Extra copies of the circuit diagram are provided below.

| tclk | Valid/Invalid | Latency (ns) | Throughput<br>(1/ns) | Pipeline reg-<br>isters at input<br>X | Pipeline reg-<br>isters at input<br>Y |
|------|---------------|--------------|----------------------|---------------------------------------|---------------------------------------|
| 6 ns | Invali d      |              |                      |                                       |                                       |
| 7 ns | ralid         | 28 05        | 77                   | 6                                     | lorZ                                  |



## Problem 6.

A complex combinational circuit is constructed entirely from 2-input NAND gates having a propagation delay of 1 ns. If this circuit is pipelined for maximal throughput by adding (non-ideal) registers whose setup time and propagation delay are each 1 ns, what is the throughput of the resulting pipeline? Enter a number, a formula, or "CAN'T TELL".

Throughput (ns-1):  $\frac{1/3}{5}$   $t_{cl_{R}} \ge t_{PD,R} + t_{PD,NAND} + t_{setup,R}$   $\Longrightarrow t_{cl_{R}} \ge t_{cl_{R}} \ge 1 + 1 + 1 = 3 \text{ as}$ 

6.004 Worksheet

## Problem 7.

The following combinational circuit computes F(X,Y) and G(X,Y) from inputs X and Y. The t<sub>PD</sub> (in ns) of each individual component is shown inside its box.



(A) Using ideal zero-delay registers, mark the location of the minimal number of registers necessary to achieve maximum throughput. Give the latency and throughput of your pipelined circuit.

mark diagram above Latency: 3x7=21 (ns); Throughput: 1/7 (1/ns)

Rummaging through the stockroom you find a pipelined component with two pipeline stages that can replace the "7" module. The minimum  $t_{CLK}$  for the new component is 4ns. The updated circuit is shown below.



(B) Using ideal zero-delay registers, mark the location of the minimal number of registers necessary to achieve maximum throughput. Give the latency and throughput of your pipelined circuit.

mark diagram above Latency: 5x5=75 (ns); Throughput:  $\frac{1}{5}$  (1/ns)