In this section, we will see how to apply the principles
and components of arithmetic circuits to implement a subsystem of moderate
complexity. Our objective is to design a fast 8-by-8 bit multiplier using
4-by-4 bit multipliers as building blocks, along with adders, arithmetic
logic, and carry look-ahead units.

First, we denote the two 8-bit magnitudes to be multiplied
as

To see how this works, let's examine the multiplication
of the 8-bit binary numbers 111100102 and 100011002. These correspond to
the decimal numbers 242 and 140, respectively.

As a check, we see that 242 * 140 = 33880, which is
equal to 10000100010110002.

The hardware implementation follows directly from this
observation. It requires four 4-by-4 multipliers, implemented as in Figure
5.31, plus logic to sum the four-bit wide slices of the partial products.

Let's call the four 8-bit partial products PP0, PP1,
PP2, and PP3. Then the final product bits are computed as follows:

Of course, any carry-out of the calculation of

Figure 5.32 shows a way to cascade full adders to implement a function
that sums three 4-bit quantities, denoted

Figure 5.33 shows how the logic of Figure 5.32 can be implemented with
TTL components. The first-level full adders are provided by 74183 dual binary
adders. The second-level adders are implemented by a 74181 arithmetic logic
unit, configured for the adder function. This has the extra performance
advantage of internal carry look-ahead logic. Note that the ALU block is
written in its positive logic form, with positive logic data inputs and
outputs and negative logic carry-in and carry-out.

Figure 5.33 provides the basic building block we can
use to implement bit slices

The rightmost 74181 component and its two associated 74183s implement
bit slice

Figure 5.34 also includes the implementation of slice

This is shown in Figure 5.35.

The generate/propagate outputs of the three 74181s are wired to the corresponding
inputs of the 74182 carry look-ahead unit. The component is drawn with positive
logic generate and propagate inputs and negative logic carries. This matches
the notation used for the ALUs. The

A circuit this complex is far too complicated to analyze by simply counting gate delays. We start by identifying the

The first step in the critical path is the calculation of the partial products by the 74284/285 multipliers. Assuming standard TTL components, the typical delay from the arrival of the inputs to valid outputs is 40 ns

The next step in the critical path is the formation of the intermediate sums by the 74183s. We assume LS TTL for these packages. Since the typical adder delay is sensitive to the final value of the sum output, 9 ns for a low-to-high transition and 20 ns for a high-to-low output transition, it is reasonable to average these to get 15 ns. For worst-case delay, we should use the worst-case maximum, which is 33 ns.

The final leg of the critical path is the calculation of the second-stage sums using the carry look-aheads. This consists of three pieces:

We assume LS TTL for the 74181 and standard TTL for the 74182. For the 74LS181, from inputs valid to group propagate/generate valid takes 20 ns typical

Using a standard TTL 74182, the delay from group propagate/generate in to valid carry-outs is 13 ns typical, 22 ns worst case. Returning to the 74LS181, the last piece of the critical path is the delay from carry-in valid to sums valid. This is 15 ns typical, 26 ns worst case.

So the typical delay is 40

### 5.6.1 Theory of Operation

In the last section, we saw how to express an 8-bit product as a series of sums of 1-bit products, so-called*partial product accumulation*. We can exploit the same principle to construct multipliers of wider bit widths using primitive 4-by-4 multiplier blocks.*A*7-0 and*B*7-0 and the 16-bit product that results as*P*15-0. We can partition*A*and*B*into two 4-bit groups,*A*7-4,*A*3-0,*B*7-4,*B*3-0, and form their 16-bit product as a sum of several 8-bit products:`(`

See Appendix A to review base
conversions.```
)
```

*P*7-4 must be added to the sum for*P*11-8, and likewise for the carry-out of*P*11-8 to*P*15-12.### 5.6.2 Implementation

The basic blocks of the implementation are`(`

1`)`

the calculation of partial products, `(`

2`)`

the summing
of the 4-bit product slices, and `(`

3`)`

the carry
look-ahead unit. We examine each of these in turn.**Calculation of Partial Products**Each of the 8-bit partial products is implemented by a 74284/74285 pair. The subsystem has 16 inputs, the multiplicand and multiplier, and 32 outputs, constituting the four 8-bit partial products. The partial product subsystem is shown in Figure 5.31.**Calculation of Sums**The low-order 4 bits of the final product,*P*3-0, are the same as PP03-0 and do not participate in the sums.*P*7-4 and*P*11-8 are sums of three 4-bit quantities. How do we compute these?*A*3-0,*B*3-0, and*C*3-0.`(`

Watch that you don't confuse the variable
*C*i with adder carry-ins.`)`

The first level of full
adders sums 1 bit from each of the three numbers to be added. We accomplish
this by using the carry input as a data input. The second-level adders combine
the carry-out from the next lower order stage with the sum from the first-level
adder. The carries simply propagate from right to left among the second-level
adders. This is just like the carry propagations we needed in the 4-by-4
multiplier of Figure 5.28.
*P*7-4 and*P*11-8 for the result products. This is shown in Figure 5.34.*P*7-4. The logic is cascaded with an identical block of components to implement bit slice*P*11-8.*P*15-12. The final slice is formed from the partial product PP37-4, plus any carry-outs from lower-order sums. We implement this using a 74181 component configured as an adder, with the*B*data inputs set to 0, the*A*inputs set to the partial product, and the carry-in coming from the adjacent adder block.**Putting the Pieces Together**The last step in the design combines the multiplier block with the accumulation block. To further improve the performance, the carries between the 74181s can be replaced with a 74182 carry look-ahead unit.*C*n input is wired high, matching the carry-in to the lowest order 74181. The generated*C*n + x and*C*n + y carries are routed to the carry-in of the middle and high-order 74181s, respectively.**Package Count and Performance**In terms of package count, the complete implementation uses four 74284/74285 multipliers`(`

eight packages`)`

, four 74183 full
adder packages, three 74181 arithmetic logic units, and one 74182 carry
look-ahead unit. This is a total of 16 packages.A circuit this complex is far too complicated to analyze by simply counting gate delays. We start by identifying the

*critical delay path*. This is the sequence of propagated signals that limits the performance of the circuit. Once we have determined the critical path, the TTL catalog will provide us with signal delays associated with the individual packages in our implementation.The first step in the critical path is the calculation of the partial products by the 74284/285 multipliers. Assuming standard TTL components, the typical delay from the arrival of the inputs to valid outputs is 40 ns

`(`

60 ns maximum`)`

.The next step in the critical path is the formation of the intermediate sums by the 74183s. We assume LS TTL for these packages. Since the typical adder delay is sensitive to the final value of the sum output, 9 ns for a low-to-high transition and 20 ns for a high-to-low output transition, it is reasonable to average these to get 15 ns. For worst-case delay, we should use the worst-case maximum, which is 33 ns.

The final leg of the critical path is the calculation of the second-stage sums using the carry look-aheads. This consists of three pieces:

`(`

1`)`

calculation of the group propagates/generates
in the 74181s, `(`

2`)`

calculations of the carry-outs
by the 74182 after the propagates and generates become valid, and `(`

3`)`

calculation of the final sums in the 74181s once the carries are valid.We assume LS TTL for the 74181 and standard TTL for the 74182. For the 74LS181, from inputs valid to group propagate/generate valid takes 20 ns typical

`(`

30 ns maximum`)`

. In this case,
the propagate is slightly slower than the generate, so this is the signal
that really determines the delay.Using a standard TTL 74182, the delay from group propagate/generate in to valid carry-outs is 13 ns typical, 22 ns worst case. Returning to the 74LS181, the last piece of the critical path is the delay from carry-in valid to sums valid. This is 15 ns typical, 26 ns worst case.

So the typical delay is 40

`(`

multipliers`)`

+ 15 `(`

full adders`)`

+ 20 `(`

generate/propagate`)`

+ 13 `(`

carry-outs`)`

+ 15 `(`

sums`)`

= 103 ns. The worst-case delay is 60 + 33 + 30 + 22 + 26 = 171 ns. There
is a significant difference between the worst case and the typical performance.
Also, the delay can be significantly reduced by using a faster TTL family,
such as S or AS logic.