Real Computer Science begins where we almost stop reading ...: Finite State Machine Design with Programmable Logic

Sunday, 6 October 2013

Finite State Machine Design with Programmable Logic

10.1.1 Mapping a State Machine into a ROM Implementation

A ROM or PAL/PLA is a convenient way to implement the combinational logic of a finite state machine. The state is stored in an external register whose outputs are fed back as addresses to the ROM or inputs to the PAL/PLA.

Figure 10.1 shows the general block diagram of a synchronous Mealy finite state machine and how this is mapped onto a ROM implementation.

In the figure, the finite state machine has n inputs, k outputs, and m state bits. The ROM address bits are at the left of the ROM block, with the data bits at the right. The n + m input bits and next-state bits form the ROM's address lines. The k + m output and state bits form the ROM's output lines. Thus, we need a ROM of 2n + m words of k + m bits each to implement this finite state machine.

It is easy for you to fill in the ROM contents, given an encoded state transition table organized as a truth table. Each row of the table, identified by the input and current state bits, selects a unique word of the ROM. You store the bit pattern for the next state and outputs in this word.

Effect on the Design Process Using programmable logic for the finite state machine implementation has important implications for state assignment, choice of flip-flop, and state machine partitioning. Unlike discrete logic, which has few external constraints, programmable logic has many. For example, programmable logic devices have a fixed number of pins, preassigned to be inputs or outputs. The internal resources, be they product terms or ROM words, are severely limited. This leads to more complexity when mapping a design onto programmable logic.

It simplifies your design if the next-state and output functions can be made to "fit" within a single component. Unfortunately, this is not always possible for complex finite state machines. You may need the partitioning strategies of Chapter 9 to map the state machine onto as few components as possible.

Since a programmable logic component's outputs are limited, we prefer D flip-flops or MSI registers for the state. We rarely use J-K flip-flops, with their two wires per bit. This helps to simplify the flip-flop choice.

The choice of programmable logic also affects the state assignment process. Since the complexity of a ROM is determined solely by the number of inputs and outputs, the kind of ROM you need is the same regardless of the state assignment. However, a PAL/PLA, with its limited product terms, requires a good state assignment to keep manageable the number of terms to implement the next state and output logic.

ROM Versus PLA-Based Design

Let's compare the implementation of the same finite state machine using a ROM and a PLA. The function to be implemented is a BCD-to-Excess-3 code converter. We obtain the Excess-3 code by adding 112 to the BCD number, as shown in Figure 10.2.

We could easily implement this as a 4-bit combinational logic function. However, our machine will be designed to accept a bit-serial BCD number, starting with the least significant bit. The machine has one input X, the BCD bit, and one output Z, the corresponding Excess-3 bit.
Reduced State Diagram/Symbolic State Transition Table We will not obtain the state transition table and state diagram here, but simply present them in Figure 10.3 and Figure 10.4.

You should verify that the state diagram actually implements the conversion (see Exercise 10.1). By the way, you may wonder why state S6 has only one transition. Since the input is always a BCD digit, if you get to this state, the next input can never be 1.

State Assignment for ROM-Based Implementation We assume that the seven states of the finite state machine are encoded in 3 bits. Thus, the ROM-based implementation requires a 16-word ROM (four address bits: input X and three current-state bits, Q2, Q1, Q0) with four data outputs (output Z and three next-state bits, D2, D1, D0). Since the size of the ROM is independent of the state assignment, a straightforward assignment will be used (that is, S0 = 000, S1 = 001, etc.). This leads us to the truth table of Figure 10.5.

The inputs at the left form the ROM's address and the outputs at the right are the contents of the ROM word at that address.
Hardware Implementation for the ROM-Based Design We choose a synchronous Mealy machine implementation, as shown in Figure 10.6.

The ROM addresses are at the left of the converter ROM, with the outputs at the right. The 74175 is a TTL component with quad positive edge--triggered D flip-flops with common clear and clock signals.

It is important to understand the timing behavior of our implementation. Figure 10.7 gives the timing waveforms for the converter's output behavior, for the input strings 0000 (0) and 1110 (7). Don't forget that these arrive in the sequence from least significant bit to most significant bit.

Let's start with the first string, 0000. Inputs are sampled on the rising edge, with the output becoming valid just after the edge. After reset, a 0 at the input results in a 1 at the output after the clock edge. At the next rising edge, the input is still 0 and the output stays 1. The output falls and stays low for the next two positive edges. Thus, the string 0000 is converted to 1100, representing a binary three when read backward.

Similarly, the string 1110 generates the output 0101. On the first rising edge, the input is 1 and the output stays 0. On the next positive edge, the input is 1 but the output rises. The input stays 1 for the third positive edge, but the output goes low. Before the fourth and final edge, the input goes low, causing the output to come high. Thus, a backward seven has been converted to a backward ten.
State Assignment for PAL/PLA-Based Implementation For a PAL/PLA based implementation, we must first perform a state assignment with a tool like nova, followed by a two-level minimization with espresso. We show the nova input file in Figure 10.8.

Suppose that the tool (or a pencil-and-paper method) yields the following state assignment (other assignments might be equally good):
S0 = 000
S1 = 001
S2 = 011
S3 = 110
S4 = 100
S5 = 111
S6 = 101

To find the minimized two-level implementation, we next run espresso. The espresso input file that corresponds to this state assignment is given in Figure 10.9. We have added the don't-care transitions to obtain the best possible reductions.

The espresso results, shown in Figure 10.10, lead to the following reduced logic:

If we select a PLA to implement the logic, it must provide four inputs, four outputs, and at least nine product terms. The implementation schematic is shown in Figure 10.11.

It shouldn't surprise you that it is essentially identical to Figure 10.6. We only replace the converter ROM with a PLA.

PAL Versus PLA-Based Implementations As mentioned in Chapter 4, the delay in a PLA tends to be worse than that in a comparable PAL. Although PLAs are very popular in full custom VLSI designs, PALs are more likely to be used in conventional board-level designs.

If you select a PAL to implement the code converter finite state machine, then (at least) one of the gates of the OR plane must have a four-input fan-in. Not every PAL has this kind of internal architecture. For example, a 10H8 PAL (10 inputs, eight outputs, two inputs per OR gate) could not implement the next-state logic without some modification to the equations, but a 12H6 PAL (12 inputs, six outputs, two OR gates with four inputs each, four OR gates with two inputs each) could implement it without modification.

Suppose that as the designer, you have only the simpler 10H8 PAL available. It is not difficult to get the equations into the right form. You simply rewrite them as subexpressions that do not exceed the maximum product term count for the PAL. Here is the way to write the next-state function D1 as expressions with no more than two product terms per function:

Figure 10.12 shows the partial 10H8 PAL programming map for this logic. The programming map includes indices for the vertical input lines and the horizontal product term lines, to let you program down to the individual crosspoints. The horizontal rows correspond to the following product terms:

0. 24. D11
1. 25. D12
8. 32.
9. 33. Not used
16. 40. X Q1
17. 41.

The D11 and D12 outputs are simply fed back as inputs, which are ORed within the PAL to form the expression for D1.

Figure 10.13 shows the pin-out and wiring. The PAL still has four inputs and two outputs that are uncommitted. These are available to implement other logic functions.
Partitioning functions in this way has some performance implications. The propagation delay to compute D1 is essentially twice that of the other outputs, since it passes through the AND-OR network twice while they pass through only once. Thus, D1 could easily determine the performance-limiting critical delay path.

One alternative is to use a more sophisticated PAL that provides interconnect for outputs to be fed back as inputs in the AND array. This is a -little faster than the implementation suggested by Figure 10.13, but still results in a fundamentally multilevel logic circuit. If this is a serious -problem for the performance of the system, you will need to choose a PAL containing OR gates with a sufficient number of product term fan-ins.

10.1.3 Alternative PAL Architectures

So far, we have examined PALs in their simplest form: relatively small fan-in structures suitable for implementing simple combinational logic. PAL architectures are actually much richer than this. For example, some members of the PAL family contain on-chip flip-flops, they can generate either positive or negative logic outputs, they can allow a pin to be selectively programmed for input or output, or they include XOR gates. Let's examine these alternatives.

Registered PALs A registered PAL is a programmable AND array device with on-chip flip-flops associated with the output pins. These devices are particularly useful for implementing synchronous Mealy machines or Moore machines. Part of the programming map for a registered PAL is shown in Figure 10.14.

Positive edge-triggered D flip-flops latch the OR plane outputs. These are gated to the output pins through tri-state inverting buffers when the

signal is asserted low.
Depending on the details of the PAL architecture, you may supply the

signal by a dedicated input pin or drive it from a product term computed within the array. Similarly, you can provide the Clock signal from off-chip or compute it within the PAL.

Figure 10.14 shows the output of the flip-flop fed back to the AND array. If this is a next-state function, the signal fed back serves as the current-state input.

The feedback line is the negative logic output of the flip-flop. This isn't as confusing as it may seem. It is common practice to implement the negative logic form of the function within the AND array. The inverting tri-state buffer complements this function to get back to the positive logic form. The feedback buffer drives the function and its complement back into the AND array.

The CAD software usually hides the issues of signal polarities from you. For example, the ABEL system understands the underlying PAL architecture and handles the polarities accordingly. This is why you need to identify the device in the ABEL description. The ABEL software implements sophisticated methods for mapping Boolean equations onto the specific PAL devices.
Example

	Code Converter Implemented with a Registered
PAL

The function implemented at the input of the flip-flop in the PAL fragment of Figure 10.14 is the negative logic for the next-state function D2. ABEL handles the translation to negative logic directly; otherwise minimization tools let you find the optimized form of a function's complement. For example, you can specify the -epos option in espresso to minimize the complements of the functions specified in the input file.

In negative logic, the code converter next-state equations become

Since both the positive and negative forms of the current-state bits are available to the AND plane, the detailed programming of the functions is much as before.
PALs with Programmable Outputs More sophisticated PALs allow you to program the output's polarity, giving you the option of positive or negative logic outputs.

Figure 10.16 shows how this is done in the PAL. An XOR gate is placed between the output of the OR gate and the input to the D flip-flop. One of XOR's inputs can be programmed with a connection to ground. When this fused connection is blown, the input is treated as though it were high and the XOR behaves like an inverter. It passes the inverted signal to the D flip-flop, which inverts the signal one more time on the way to the output pin and the feedback path. When the connection is left intact, the input is 0 and the XOR behaves like a noninverting buffer.
Programmable polarity can also help you overcome the limited product term inputs to the OR gate. The complement of a function in sum of products form may use fewer terms than the function itself. You can implement the function in its complemented form, using the XOR gate to return the function to its true sense.

Additional Variations on PAL Architectures PAL architectures continue to evolve, giving the basic PAL structure ever more function and flexibility. We mention some of the newer variations here.

XOR PALs (as distinct from PALs with programmable polarity) -contain internal XOR gates whose inputs are fed from AND-OR array structures. They are well suited for computing certain arithmetic functions that would otherwise generate many product terms.
An example is the P20X10 PAL of Figure 10.16.

It also illustrates some of the complexities of the detailed structure of a typical PAL. The 20X10 contains 10 data input pins and 10 output pins with internal D flip-flops. The outputs are fed back into the AND plane, a total of 20 possible AND array inputs. Pin 1 is a dedicated input that drives the flip-flop clocks. Pin 13 controls the tri-state enables of the inverting output buffers of the registered outputs. Two-input XOR gates at the inputs to the D flip-flops are driven by OR gates with two product terms each.

As an example, let's look at alternative PAL-based implementations of the expression A Ý B Ý C Ý D. Using a conventional AND-OR PAL, we need 8 four-literal product terms. But we could also implement this in only four terms in an AND-OR-XOR structure. The two alternatives are shown in Figure 10.17.

Typical registered PALs have several limitations on their inputs and outputs. For example, an output pin is dedicated to the register even if it is used to hold state information that is never provided off-chip. One solution is to have "buried" registers, with the outputs decoupled from the registers. The register can feed back a signal to the AND array while the output pin carries a different signal. Or the outputs may be multiplexed, allowing the output pin to be connected to the combinational output from the array or the registered output from the flip-flop. An example of this more sophisticated PAL structure is the P22V10, to be shown in Figure 10.28. We will discuss it at that time.

10.1.4 Specifying PALs with ABEL

In this section, we describe how to specify PAL-based implementations of Boolean functions using ABEL. If you do not have access to this software, you may want to skim or skip this subsection.

An advantage of programmable logic is the power of the CAD tools provided with a particular PAL design environment. These map the positive logic equations into the form needed for a target PAL architecture. Many, but not all, of the device details are hidden from the designer.

In Chapter 8 we saw how ABEL can specify the behavior of finite state machines. ABEL maps the descriptions into the programming map, the list of fuses to be blown within the device. Your ABEL description must alert the software about the PAL you plan to use through the device statement.

In this subsection, we will examine mapping the BCD-to-Excess-3 code converter state machine onto various PAL devices.

Implementation with P10H8 ABEL's strength and weakness is its close relationship with the underlying PAL architecture. Let's look at the implementation of the state machine of Figure 10.13. Because this particular PAL has limited product term resources, you must describe the D1 next-state function in terms of two simpler expressions, D11 and D12. The software will not automatically perform this partitioning. The ABEL description must reflect your partitioning decisions:

module bcd2excess3

title 'BCD to Excess 3 Code Converter State Machine

	Joe Engineer, Itty Bitty Machines, Inc.'

u1	device		'p10h8';

"Input Pins

	X,Q2,Q1,Q0,D11i,D12i	 				pin 	1,2,3,4,5,6;

"Output Pins

	D2,D11o,D12o,D1,D0,Z	 				pin 19,18,17,16,15,14;

INSTATE = [Q2, Q1, Q0];

S0 = [0, 0, 0];

S1 = [0, 0, 1];

S2 = [0, 1, 1];

S3 = [1, 1, 0];

S4 = [1, 0, 0];

S5 = [1, 1, 1];

S6 = [1, 0, 1];

equations

	D2 = (!Q2 & Q0) # (Q2 & !Q0);

	D1 = D11i # D12i;

	D11o = (!X & !Q2 & !Q1 & Q0) # (X & !Q2 & !Q0);

	D12o = (!X & Q2 & !Q0) # (Q1 & !Q0);

	D0 = !Q0;

	Z = (X & Q1) # (!X & !Q1);

test_vectors

([X, INSTATE, D11i, D12i]->[D2, D1, D0, D11o, D12o, Z])

[0, S0,.X.,.X.] -> [0,.X., 1, 0, 0, 1];
[0, S0, 0, 0] -> [0, 0, 1,.X.,.X., 1];
[0, S1,.X.,.X.] -> [1,.X., 0, 1, 0, 1];
[0, S1, 1, 0] -> [1, 1, 0,.X.,.X., 1];
[0, S3,.X.,.X.] -> [1,.X., 1, 0, 1, 0];
[0, S3, 0, 1] -> [1, 1, 1,.X.,.X., 0];
[0, S5,.X.,.X.] -> [0,.X., 0, 0, 0, 0];
[0, S5, 0, 0] -> [0, 0, 0,.X.,.X., 0];
end bcd2excess3;

The device statement specifies a P10H8 PAL. The input and output pin descriptions match the assignment of Figure 10.13. Since the P10H8 is purely combinational, the description simply consists of the Boolean equations for the PAL's outputs: D2, D1, D11, D12, D0, and Z. Registers and clocking logic are external to the PAL (see Figure 10.11).

The test vectors are complex because D1 is computed in two steps. D11 and D12 are computed first and then ORed to form D1. We need two test vectors for each step: one for computing the intermediate values D11 and D12, and one for the final value of D1.

The first test vector has input X at 0 and the machine in state S0. D2, D0, D11, D12, and Z are checked by the output vector, with D1 left as a don't care. The second test vector takes the current value for the state and input and the values for D11 and D12 just checked by the previous vector. It then verifies that the next state outputs become 001, which is the encoding for S1.

Similarly, the third and fourth vectors verify the transition from S1 to S3 on 0 generating a 1. The fifth and sixth vectors do the same for the S3-to-S5 transition on 0, generating a 0. The last two vectors handle the case of the transition from S5 to S0 on 0, still outputting a 0.

Implementation with a 12H6 PAL To see how the PAL architecture can influence the ABEL description, let's examine the implementation of the converter state machine using a 12H6 PAL. This device has 12 inputs and six outputs. Two of the outputs are defined by four product terms (output pins 18 and 13). The ABEL description becomes

module bcd2excess3

title 'BCD to Excess 3 Code Converter State Machine

	Joe Engineer, Itty Bitty Machines, Inc.'

u1	device	 'p12h6';

"Input Pins

	X, Q2, Q1, Q0	 			pin	 1, 2, 3, 4;

"Output Pins

	D2, D1, D0, Z	 			pin 17, 18, 16, 15;

INSTATE = [Q2, Q1, Q0];	 OUTSTATE = [D2, D1, D0];

S0in = [0, 0, 0];       S0out = [0, 0, 0];

S1in = [0, 0, 1];       S1out = [0, 0, 1];

S2in = [0, 1, 1];       S2out = [0, 1, 1];

S3in = [1, 1, 0];       S3out = [1, 1, 0];

S4in = [1, 0, 0];       S4out = [1, 0, 0];

S5in = [1, 1, 1];       S5out = [1, 1, 1];

S6in = [1, 0, 1];       S6out = [1, 0, 1];

equations

	D2 = (!Q2 & Q0) # (Q2 & !Q0);

	D1 = (!X & !Q2 & !Q1 & Q0) # (X & !Q2 & !Q0) # 

		 (!X & Q2 & !Q0) # (Q1 & !Q0);

	D0 = !Q0;

	Z = (X & Q1) # (!X & !Q1);

test_vectors ([X, INSTATE] -> [OUTSTATE, Z])
[0, S0in] -> [S1out, 1];
[0, S1in] -> [S3out, 1];
[0, S3in] -> [S5out, 0];
[0, S5in] -> [S0out, 0];
end bcd2excess3;

This description is much more straightforward because we don't need the artificial partitioning of D1. The test vectors are also less complicated.

Implementation with a 16R6 PAL In our final case, we look at describing the state machine for a registered PAL, in this case a 16R6. The PAL structure is shown in Figure 10.18. It has eight data inputs, eight feedback inputs from the outputs, six registered outputs (pins 13, 14, 15, 16, 17, 18), two combinational outputs (pins 12 and 19), a clock input (pin 1), and an active low output enable (pin 11). Each output can be defined by up to eight product terms.

The embedded registers allow us to implement the state machine in a single device. However, we must add an input to reset the machine to state S0, as well as the clock and output enable inputs. The ABEL description is

module bcd2excess3

title 'BCD to Excess 3 Code Converter State Machine

	Joe Engineer, Itty Bitty Machines, Inc.'

u1	device		'p16r6';

"Input Pins

	Clk, Reset, X,  !OE pin  1,  2,  3,  11;

"Output Pins

	D2,  D1,    D0, Z   pin  13, 14, 15, 12;

SREG = [D2, D1, D0];

S0 = [0, 0, 0];

S1 = [0, 0, 1];

S2 = [0, 1, 1];

S3 = [1, 1, 0];

S4 = [1, 0, 0];

S5 = [1, 1, 1];

S6 = [1, 0, 1];

state_diagram SREG

state S0: if Reset then S0

	    else if X then S2 with Z = 0 

	    else S1 with Z = 1

state S1: if Reset then S0
else if X then S4 with Z = 0
else S3 with Z = 1

state S2: if Reset then S0
else if X then S4 with Z = 1
else S4 with Z = 0

state S3: if Reset then S0
else if X then S5 with Z = 1
else S5 with Z = 0

state S4: if Reset then S0
else if X then S6 with Z = 0
else S5 with Z = 1

state S5: if Reset then S0

	    else if X then S0 with Z = 1 

	    else S0 with Z = 0

state S6: if Reset then S0

	    else if !X then S0 with Z = 1

test_vectors ([Clk, Reset, !OE, X] -> [SREG, Z])
[.C., 1, 0,.X.] -> [S0,.X.];
[ 0, 0, 0, 0] -> [S0, 1];
[.C., 0, 0, 0] -> [S1,.X.];
[ 0, 0, 0, 0] -> [S1, 1];
[.C., 0, 0, 0] -> [S3,.X.];
[ 0, 0, 0, 0] -> [S3, 0];
[.C., 0, 0, 0] -> [S5,.X.];
[ 0, 0, 0, 0] -> [S5, 0];
[.C., 0, 0, 0] -> [S0,.X.];
end bcd2excess3;

We use the state_diagram statement in this description. The next-state functions of this PAL are sequential, not combinational as in the previous cases. In each state, the Reset input is checked. If it is asserted, the machine returns to state S0.

The active low signal

is specially handled in the description. In the input pin description, the signal is declared to be active low by listing it as !OE. To retain the sense of the signal, we must use !OE in the test vector and list the signal's value as 0 in cases in which the output tri-states are turned on.

Take a close look at the way we have specified the test vectors. The ABEL description specifies the converter as an asynchronous Mealy machine: the output Z is combinational rather than registered. To make Z synchronous, we need to associate Z with one of the registered outputs, such as pin 17. Thus, for a given state transition, the output is valid at the next rising edge. We model this in the set of test vectors with two vectors. The first moves the machine into a new state, ignoring the output. The second verifies that the output is correct with the machine in a given state and the clock at 0.

The first vector in the description checks that reset works correctly. When the clock edge comes, if Reset is asserted, the machine enters state S0. With an input of 0 and being in S0, the machine outputs a 1. On the next clock edge, the machine advances to state S1. We do not check the output until the clock falls. Since the input is still 0, the output is 1.

On the next edge, the machine goes to state S3. The output is checked when the clock goes low, when it should be 0 if the input is still 0. On the next positive clock edge, the machine enters S5. Since the input remains 0, the output stays at 0. Finally, the next clock edge leads the machine back to state S0.

Real Computer Science begins where we almost stop reading ...