Real Computer Science begins where we almost stop reading ...: Combinational Multiplier

Saturday, 5 October 2013

Combinational Multiplier

In this section we look at the design of multiplier circuitry. The methods we introduce are combinational, although alternative methods based on circuits with state are also possible. However, the fastest circuits for multiplication use just the techniques we will be discussing here.

Basic Concept Throughout this section, we will look only at multiplication techniques for unsigned numbers. Alternatively, the hardware we present is suitable for sign and magnitude multiplication, but we concentrate on the manipulation of the magnitude part. Recall that the two numbers involved in a multiplication are called the multiplicand and the multiplier.
The process of binary multiplication is best illustrated with an example. In this case, the multiplicand is 11012 (13) and the multiplier is 10112 (11):

Each bit of the multiplier is multiplied against the multiplicand, the product is aligned according to the position of the bit within the multiplier, and the resulting products are then summed to form the final result. One attraction of binary multiplication is how easy it is to form these intermediate products: if the multiplier bit is a 1, the product is an appropriately shifted copy of the multiplicand; if the multiplier bit is a 0, the product is simply 0.

For an n-bit multiplicand and multiplier, the resulting product will be 2n bits. Stated differently, the product of 2n and 2n is 2n + n = 22n, a 2n-bit number. Thus, the product of two 4-bit numbers requires 8 bits, of two 8-bit numbers requires 16 bits, and so on.

Partial Product Accumulation We can construct a combinational circuit that directly implements the process described by the preceding example. The method is called partial product accumulation.
First, we rewrite the multiplicand bits as A3, A2, A1, A0 and the multiplier bits as B3, B2, B1, B0. The multiplication of A and B becomes

Each of the ANDed terms is called a partial product. The resulting product is formed by accumulating down the columns of partial products, propagating the carries from the rightmost columns to the left.
A combinational circuit for implementing the 4-bit multiplier is shown in Figure 5.28.

The first level of 16 AND gates computes the individual partial products. The second- and third-level logic blocks form the accumulation of the products on a column-by-column basis. The column sums are formed by a mixture of cascaded half adders and full adders. In the figure, inputs from the top are the bits to be added and the input from the right is the carry-in. The output from the bottom is the sum and to the left is the carry-out.

To see how the partial products are accumulated, let's look at the circuit of Figure 5.28 in a little more detail. S1 is the straightforward sum of just two partial products, A1 · B0 and A0 · B1. S2 is the sum of three products. We implement this with two cascaded adders, one of which takes the carry-out from S1's column.

S3 is a little more complicated, because there are two different carry-outs from the previous column. We use three cascaded adders, two full adders and one half adder, to implement the sum. The two carry-outs from S2 are accumulated through the carry-in inputs of the two full adders.

S4 is the sum of three products and three possible carry-outs from S3. The carries make this case more complicated than the S2 sum. The solution is to implement the sum with three adders-two full adders and one half adder. The two full adders sum the three products and two of the carry-outs. The half adder adds to this result the third possible carry-in.

The logic for S5 is similar. Here we must sum two products and three carries. Two full adders do the job. Note that the second full adder sums two of the three carries from the previous column with the result of the first full adder. A similar analysis applies to S7 and S6.

The delay through the multiplier is determined by the ripple carries be-tween the adders. We can use a carry look-ahead scheme to reduce these delays.

Clearly, the full combinational multiplier uses a lot of hardware. The dominating costs are the adders-four half adders and eight full adders. To simplify the implementation slightly, a designer may choose to use full adders for all of the adder blocks, setting the carry input to 0 where the half adder function is required. Given the full adder schematic of Figure 5.7, this is 12 adders of six gates each, for a total of 72 gates. When we add to this the 16 gates forming the partial products, the total for the whole circuit is 88 gates. It is easy to see that combinational multipliers can be justified only for the most high performance of -applications.
A slightly different implementation of the 4-by-4 combinational multiplier is shown in Figure 5.29.

Figure 5.29(a) gives the basic building block, a full adder circuit that sums a locally computed partial product (X · Y), an input passed into the block from above (Sum In), and a carry passed from a block diagonally above. It generates a carry-out (Cout) and a new sum (Sum Out). Figure 5.29(b) shows the interconnection of 16 of these blocks to implement the full multiplier function. The Ai values are distributed along block diagonals and the Bi values are passed along rows. This implementation uses the same gate count as the previous one: 16 AND gates and 12 adders (the top row does not need adders).
TTL Multiplier Components The TTL components 74284 and 74285 provide a two-chip implementation of a 4-by-4 parallel binary multiplier.

Figure 5.30 illustrates their use. The 74284 component implements the high-order 4 bits of the product, while the 74285 implements the low-order bits. Both chips have dual active low output enable signals,