Real Computer Science begins where we almost stop reading ...: An 8-by-8 Bit Multiplier

Saturday, 5 October 2013

An 8-by-8 Bit Multiplier

In this section, we will see how to apply the principles and components of arithmetic circuits to implement a subsystem of moderate complexity. Our objective is to design a fast 8-by-8 bit multiplier using 4-by-4 bit multipliers as building blocks, along with adders, arithmetic logic, and carry look-ahead units.

5.6.1 Theory of Operation

In the last section, we saw how to express an 8-bit product as a series of sums of 1-bit products, so-called partial product accumulation. We can exploit the same principle to construct multipliers of wider bit widths using primitive 4-by-4 multiplier blocks. First, we denote the two 8-bit magnitudes to be multiplied as A7-0 and B7-0 and the 16-bit product that results as P15-0. We can partition A and B into two 4-bit groups, A7-4, A3-0, B7-4, B3-0, and form their 16-bit product as a sum of several 8-bit products:

To see how this works, let's examine the multiplication of the 8-bit binary numbers 111100102 and 100011002. These correspond to the decimal numbers 242 and 140, respectively.

As a check, we see that 242 * 140 = 33880, which is equal to 10000100010110002. (See Appendix A to review base conversions.

The hardware implementation follows directly from this observation. It requires four 4-by-4 multipliers, implemented as in Figure 5.31, plus logic to sum the four-bit wide slices of the partial products.
Let's call the four 8-bit partial products PP0, PP1, PP2, and PP3. Then the final product bits are computed as follows:

Of course, any carry-out of the calculation of P7-4 must be added to the sum for P11-8, and likewise for the carry-out of P11-8 to P15-12.

5.6.2 Implementation

The basic blocks of the implementation are (1) the calculation of partial products, (2) the summing of the 4-bit product slices, and (3) the carry look-ahead unit. We examine each of these in turn.

Calculation of Partial Products Each of the 8-bit partial products is implemented by a 74284/74285 pair. The subsystem has 16 inputs, the multiplicand and multiplier, and 32 outputs, constituting the four 8-bit partial products. The partial product subsystem is shown in Figure 5.31.

Calculation of Sums The low-order 4 bits of the final product, P3-0, are the same as PP03-0 and do not participate in the sums. P7-4 and P11-8 are sums of three 4-bit quantities. How do we compute these?

Figure 5.32 shows a way to cascade full adders to implement a function that sums three 4-bit quantities, denoted A3-0, B3-0, and C3-0. (Watch that you don't confuse the variable Ci with adder carry-ins.) The first level of full adders sums 1 bit from each of the three numbers to be added. We accomplish this by using the carry input as a data input. The second-level adders combine the carry-out from the next lower order stage with the sum from the first-level adder. The carries simply propagate from right to left among the second-level adders. This is just like the carry propagations we needed in the 4-by-4 multiplier of Figure 5.28.

Figure 5.33 shows how the logic of Figure 5.32 can be implemented with TTL components. The first-level full adders are provided by 74183 dual binary adders. The second-level adders are implemented by a 74181 arithmetic logic unit, configured for the adder function. This has the extra performance advantage of internal carry look-ahead logic. Note that the ALU block is written in its positive logic form, with positive logic data inputs and outputs and negative logic carry-in and carry-out.
Figure 5.33 provides the basic building block we can use to implement bit slices P7-4 and P11-8 for the result products. This is shown in Figure 5.34.

The rightmost 74181 component and its two associated 74183s implement bit slice P7-4. The logic is cascaded with an identical block of components to implement bit slice P11-8.
Figure 5.34 also includes the implementation of slice P15-12. The final slice is formed from the partial product PP37-4, plus any carry-outs from lower-order sums. We implement this using a 74181 component configured as an adder, with the B data inputs set to 0, the A inputs set to the partial product, and the carry-in coming from the adjacent adder block.

Putting the Pieces Together The last step in the design combines the multiplier block with the accumulation block. To further improve the performance, the carries between the 74181s can be replaced with a 74182 carry look-ahead unit.
This is shown in Figure 5.35.

The generate/propagate outputs of the three 74181s are wired to the corresponding inputs of the 74182 carry look-ahead unit. The component is drawn with positive logic generate and propagate inputs and negative logic carries. This matches the notation used for the ALUs. The Cn input is wired high, matching the carry-in to the lowest order 74181. The generated Cn + x and Cn + y carries are routed to the carry-in of the middle and high-order 74181s, respectively.

Package Count and Performance In terms of package count, the complete implementation uses four 74284/74285 multipliers (eight packages), four 74183 full adder packages, three 74181 arithmetic logic units, and one 74182 carry look-ahead unit. This is a total of 16 packages.

A circuit this complex is far too complicated to analyze by simply counting gate delays. We start by identifying the critical delay path. This is the sequence of propagated signals that limits the performance of the circuit. Once we have determined the critical path, the TTL catalog will provide us with signal delays associated with the individual packages in our implementation.

The first step in the critical path is the calculation of the partial products by the 74284/285 multipliers. Assuming standard TTL components, the typical delay from the arrival of the inputs to valid outputs is 40 ns (60 ns maximum).

The next step in the critical path is the formation of the intermediate sums by the 74183s. We assume LS TTL for these packages. Since the typical adder delay is sensitive to the final value of the sum output, 9 ns for a low-to-high transition and 20 ns for a high-to-low output transition, it is reasonable to average these to get 15 ns. For worst-case delay, we should use the worst-case maximum, which is 33 ns.

The final leg of the critical path is the calculation of the second-stage sums using the carry look-aheads. This consists of three pieces: (1) calculation of the group propagates/generates in the 74181s, (2) calculations of the carry-outs by the 74182 after the propagates and generates become valid, and (3) calculation of the final sums in the 74181s once the carries are valid.

We assume LS TTL for the 74181 and standard TTL for the 74182. For the 74LS181, from inputs valid to group propagate/generate valid takes 20 ns typical (30 ns maximum). In this case, the propagate is slightly slower than the generate, so this is the signal that really determines the delay.

Using a standard TTL 74182, the delay from group propagate/generate in to valid carry-outs is 13 ns typical, 22 ns worst case. Returning to the 74LS181, the last piece of the critical path is the delay from carry-in valid to sums valid. This is 15 ns typical, 26 ns worst case.

So the typical delay is 40 (multipliers) + 15 (full adders) + 20 (generate/propagate) + 13 (carry-outs) + 15 (sums) = 103 ns. The worst-case delay is 60 + 33 + 30 + 22 + 26 = 171 ns. There is a significant difference between the worst case and the typical performance. Also, the delay can be significantly reduced by using a faster TTL family, such as S or AS logic.

Real Computer Science begins where we almost stop reading ...

Saturday, 5 October 2013

An 8-by-8 Bit Multiplier

5.6.1 Theory of Operation

5.6.2 Implementation

No comments:

Post a Comment