Real Computer Science begins where we almost stop reading ...: Structure of a Computer

Sunday, 6 October 2013

Structure of a Computer

Figure 11.1 shows a high-level block diagram of a computer. It is decomposed into a central processing unit (CPU), or processor, and an attached memory system. In turn, the processor is decomposed into data-path and control units.
The datapath (also called the execution unit) contains registers for storing intermediate results and combinational circuits for operating on data, such as shifting, adding, and multiplying. The latter are sometimes called functional units because they apply functions to data. Data is moved from memory into registers. It is then moved to the functional units, where the data manipulations take place. The results are placed back into registers and eventually put back into memory. The data-path implements the pathways along which data can flow from registers to functional units and back again.

The control unit (or instruction unit) implements a finite state machine that fetches a stream of instructions from memory. The instructions describe what operations, such as ADD, should be applied to which operands. The operands can be found in particular registers or in memory locations.

The control unit interprets or "executes" instructions by asserting the appropriate signals for manipulating the data-path, at the right time and in the correct sequence. For example, to add two registers and place the results in a third register, the control unit (1) asserts the necessary control signals to move the contents of the two source registers to the arithmetic logic unit (ALU), (2) instructs the ALU to perform an ADD operation by asserting the appropriate signals, and (3) moves the result to the specified destination register, again by asserting signals that establish a path between the ALU and the register.

Instructions can be grouped into three broad classes: data manipulation (add, subtract, etc.), data staging (load/store data from/to memory), and control (conditional and unconditional branches). The latter class determines the next instruction to fetch, sometimes conditionally based on inputs from the data-path. For example, the instruction may be to take the branch if the last data-path operation resulted in a negative number.

You are already familiar with the basic building blocks needed to implement the processor. You can interconnect NAND and NOR gates to build adder and logic circuits (Chapter 5) and registers (Chapters 6 and 7). The processor control unit is just another finite state machine (Chapters 8, 9, and 10). In the rest of this section, we will examine the components of a computer in a little more detail, as a prelude to the rest of this chapter.

11.1.1 Control

A processor control unit is considerably more complex than the kinds of finite state machines you have seen so far. A simple finite state machine is little more than next-state and output logic coupled to a state register. A control unit, on the other hand, needs access to its own data-path, a collection of registers containing information that affects the actions of the state machine.

Program Counter/Instruction Register For example, the control unit may have a register to hold the address of the next memory word to fetch for instruction interpretation. This is frequently called the program counter, or PC. When the instruction is moved from memory into the control unit, it must be held somewhere while the control decodes the kind of instruction it is. This staging memory, often implemented by a register, is called the instruction register or IR. It is a special-purpose register and is usually not visible to the assembly language programmer. Basic States of the Control Unit The control unit can be in one of four basic phases: Reset, Fetch the Next Instruction, Decode the Instruction, and Execute the Instruction. A high-level state diagram for a typical control unit is shown in Figure 11.2.

Let's begin with the initialization sequence. An external reset signal places the finite state machine in its initial Reset state, from which the processor is initialized. Since the state of the processor contains more than just the state register of the finite state machine, several of the special registers must also be set to an initial value. For example, the PC must be set to some value, such as 0, before the first instruction can be fetched. Perhaps an accumulator register or a special register holding an indication of the condition of the data-path will be set to 0 as well. Although shown as a single state in the figure, the initialization process may be implemented by a sequence of states.
Next, the machine enters the Fetch Instruction state. The contents of the PC are sent as an address to the memory system. Then the control generates the signals needed to commence a memory read. When the operation is complete, the instruction is available on the memory's output wires and must be moved into the control unit's IR. Again, Fetch Instruction looks like a single state in the figure, but the actual implementation involves a sequence of states.

Once the instruction is available in the IR, the control examines certain bits within the instruction to determine its type. Each instruction type leads to a different sequence of execution states. For example, the basic execution sequence for a register-to-register add instruction is identical to one for a register-to-register subtract. The operands must be moved to the ALU and the result directed to the correct register destination. The only difference is the operation requested of the ALU. As long as the basic data movements are the same, the control sequences can be parameterized by the specific operation, decoded directly from the instruction.

The state machine in the figure partitions the instructions into three classes: Branch, Load/Store, and Register-to-Register. Of course, there could be more classes. In the limit, there could be a unique execution sequence for each instruction in the processor's instruction set.

The final state takes care of housekeeping operations, such as incrementing the PC, before branching back to fetch the next instruction. The execution sequence for a taken branch modifies the PC itself, so it bypasses this step. The sequence of instruction fetch, execute, and PC increment continues until the machine is reset.

While the details of the state diagram may vary from one instruction set to another, the general sequencing and the shape of the state diagram are generic to CPU state machines. The most distinguishing feature is the multiway decode branch between the instruction fetch and its execution. This influences the design of controllers for simple CPUs that we describe in the next chapter.

11.1.2 Datapath

The elements of the data-path are built up in a hierarchical and iterative fashion. Consider how we go about constructing a 32-bit arithmetic unit for inclusion in the data-path. At the most primitive level, we begin with the half adder that can add 2 bits. By interconnecting two of these, we create the full adder. Once we have a single "bit slice" for the data-path object, we create as many instances of it as we need for the width of the data-path. For example, we construct a 32-bit adder of the ALU by iteratively composing 32 instances of a 1-bit-wide adder.

As we saw in Chapter 5, an ALU bit slice is somewhat more complicated than this. It should also include hardware for logic operations and carry lookahead to perform arithmetic operations with reduced delay. The data-path symbol for a typical arithmetic logic unit is shown in Figure 11.3.

The 32-bit A and B data inputs come from other sources in the data-path; the S output goes to a data-path destination. The operation signals come from the control unit; the carry-out signal is routed back to the control unit so that it may detect certain exceptional conditions, such as overflow, that may disrupt the normal sequencing of instructions. We construct other data-path objects, such as shifters, registers, and register files, in an analogous manner.

11.1.3 Block Diagram/Register Transfer

A computer with only a single data register, usually called the accumulator or AC, is the simplest machine organization.

Figure 11.4 shows the block diagram for such a single accumulator machine.
Instructions for a single accumulator machine are called single address instructions. This is because they contain only a single reference to memory. One operand is implicitly the AC; the other is an operand in memory. The instructions are of the form AC := AC <operation> Memory (Address). <operation> could be ADD, SUBTRACT, AND, OR, and so on.

Let's consider an ADD instruction. The old value of the AC is replaced with the sum of the AC's contents and the contents of the specified memory location.

Data and Control Flows Figure 11.4 shows the flow of data and control between memory, the control registers (IR, MAR, and PC), the data re-gister (AC), and the functional units (ALU). The MAR is the Memory Address Register, a storage element that holds the address during memory accesses. Data flows are shown as bold arrowed lines; the other lines represent control.

The core of the data-path consists of the arithmetic logic unit and the AC. The AC is the source or destination of all transfers. These transfers are initiated by store, arithmetic, or load operations. Let's look at them in more detail.

The instruction identifies not only the operation to be performed but also the address of the memory operand. Store operations move the -contents of the AC to a memory location specified by bits within the instruction. The sequencing begins by moving the specified address from the IR to the MAR. Then the contents of the AC are placed on the memory's data input lines while the MAR is placed onto its address lines. Finally, the memory control signals are cycled through a write sequence.

Arithmetic operations take as operands the contents of the accumulator and the memory location specified in the instruction. Again, the control moves the operand address from the IR to the MAR, but this time it invokes a memory read cycle. Data obtained from the load path is combined with the current contents of the AC to form the operation result. The result is then written back to the accumulator.

A load operation is actually a degenerate case of a normal arithmetic operation. The control obtains the B operand along the load path from memory, it places the ALU in a pass-through mode, and it stores the result in the AC.

Whereas load/store and arithmetic instructions manipulate the AC, branch instructions use the PC. If the instruction is an unconditional branch, the address portion of the IR replaces the PC, changing the next instruction to be executed. Similarly, a conditional branch replaces the PC if a condition specified in the instruction evaluates to true.

Placement of Instructions and Data There are two possible ways to connect the memory system to the CPU. The first is the so-called Princeton architecture: instructions and data are mixed in the same memory. In this case, the instruction and load/store paths are the same.

The alternative is the Harvard architecture. Data and instructions are stored in separate memories with independent paths into the processor.

The Princeton architecture is conceptually simpler and requires less connections to the memory, but the Harvard architecture has certain performance advantages. A Harvard architecture can fetch the next instruction even while executing the current instruction. If the current instruction needs to access memory to obtain an operand, the next instruction can still be moved into the processor. This strategy is called instruction prefetching, because the instruction is obtained before it is really needed. A Princeton architecture can prefetch instructions too. It is just more complicated to do so. To keep the discussion simple, we will assume a straightforward Prince-ton architecture in the rest of this chapter.

Detailed Instruction Trace As an example of the control signal and data flows needed to implement an instruction, let's trace a simple instruction that adds the contents of a specified memory location to the AC:

The instruction must be fetched from memory. The control does this by moving the PC to the MAR and then initiating a memory read operation. The instruction comes from memory to the IR along the instruction path.
Certain bits of the instruction encode the operation to be performed, that is, an ADD instruction. These are called the op code bits and are part of the inputs to the control finite state machine. We assume that the rest of the instruction's bits encode the address.
The control moves the operand address bits to the MAR and begins a second memory read operation to fetch the operand.
Once the data is available from memory along the load path, the control drives the ALU with signals instructing it to ADD its A and B operands to form the S result.
The control then moves the S result into the AC to complete the execution of the instruction.
The control increments the program counter to point at the next instruction. The machine returns to the first step.

Register Transfer Notation As you have seen so far, most of what the control does is transfer data from one register to another, asserting the appropriate control signals at the correct times. Control sequences are com-monly described in terms of a special register transfer notation. With-out defining the notation formally, we can illustrate the concept of register transfer with the instruction execution sequence described above:

Instruction Fetch:
PC Æ MAR;	Move PC to MAR
Memory Read;	Assert Memory READ signal
Memory Æ IR;	Load IR from Memory
Instruction Decode:
IF IR<op code> `=` ADD_FROM_MEMORY
THEN
Instruction Execution:
IR<address bits> Æ MAR;	Move operand address to MAR
Memory Read;	Assert Memory READ signal
Memory Æ ALU B;	Gate Memory to ALU B
AC Æ ALU A;	Gate AC to ALU A
ALU ADD;	Instruct ALU to perform ADD
ALU S Æ AC;	Gate ALU result to AC
PC increment;	Instruct PC to increment

We write the operation statements in terms of the control signals to be asserted, such as Memory Read, ALU ADD, or PC increment. We write register-to-register transfers in the form source register Æ destination register. The detailed pathways between registers determine the more refined register transfer description. We will see more register transfer descriptions in Section 11.2.

11.1.4 Interface to Memory

Figure 11.4 showed a reasonably generic interface to memory. A more realistic view for a Princeton architecture machine is shown in Figure 11.5.

The key elements are the two special registers, MAR and MBR, and the three control signals, Request, Read/

, and Wait. Let's start with the registers.
We have seen the MAR before. In Figure 11.5, it can be loaded from the program counter for instruction fetch or from the IR with a load or store address. To decouple the memory from the internal working of the processor, we introduce a second interface register, the Memory Buffer Register, or MBR. A bidirectional path for load/store data exists between the processor data-path and the MBR, while the pathway for instructions between the MBR and IR is unidirectional.

Besides the address and data lines, the interface to memory consists of three control signals. The Request signal notifies the memory that the processor wishes to access it. The Read/

signal specifies the direction: read from memory on a load and write to memory on a store. The Wait signal lets memory stall the processor, in effect, notifying the processor that its memory request has not yet been serviced. We can think of Wait as the complement of an acknowledgment signal.

Processor-Memory Handshaking In their most general form, the memory system and the processor do not share a common clock. To ensure proper transfer of data, we should follow the four-cycle signaling convention of Section 6.5.2. The processor asserts the read/write direction, places data in the MAR (and the MBR if a write), and asserts Request. The memory normally asserts Wait, unasserting it when the read or write is complete.

When the processor notices that Wait is no longer asserted, it latches data into the MBR on a read or tri-states the data connection to memory on a write. The processor unasserts its Request line and must wait for the Wait signal to be reasserted by the Memory before it can issue its next memory request.
The signaling waveforms are shown in Figure 11.6.

The four-cycle handshake of the Request and Wait signals for the read sequence work as follows:

Cycle 1: Request asserted. Read data placed on memory data bus.

Cycle 2: Wait unasserted. CPU latches read data into MBR.

Cycle 3: Request unasserted.

Cycle 4: Wait asserted.

In this signaling convention, a new request can be made only after the Wait signal is asserted. The write cycle is analogous.

Figure 11.7 shows possible state machine fragments for implementing the four-cycle handshake with memory. We assume a Moore machine controller implementation. In the read cycle, we enter a state that drives the address bus from the MAR, asserts the Read and Request signals, and latches the data bus into the MBR. This last transfer catches correct data only if memory has unasserted Wait, so we must loop in this state until this is true. On exit to the next state, the Request signal is unasserted and the address bus is no longer driven. The memory signals that it is ready for a new request by asserting Wait. To remain interlocked with memory, we loop in this state until Wait is asserted. The write cycle is similar.

Depending on detailed setup and hold time requirements, it may be necessary to insert additional states in the fragments of Figure 11.7. For example, if the memory system requires that the address lines and read/write direction be stable before the request is asserted, this should be done in a state preceding the one that asserts Request.

Remember that only the register transfer operations being asserted in a given state need to be written there. If an operation is not mentioned in a state (or state transition for a Mealy machine), it is implicitly unasserted. Thus, you don't have to explicitly set Request to its unasserted value in the second state of the handshake fragments. However, you should include such register transfer operations to improve the clarity of your state diagram.

11.1.5 Input/Output: The Third Component of Computer Organization

We have dealt with the interconnections between the processor and memory. The organization of the computer has a third component: input/output devices. We cover them only briefly here.

Input/output devices provide the computer's communication with the outside world. They include displays, printers, and massive storage devices, such as magnetic disks and tapes. For the purposes of this discussion, the main attribute of I/O devices is that they are typically much slower than the processor to which they are attached.

Memory-Mapped I/O I/O devices can be coupled to the processor in two primary ways: via a dedicated I/O bus or by sharing the same bus as the memory system. Almost all modern computers use the second method. One advantage of this method is that there are no special instructions to perform I/O operations. Load and store operations can initiate device reads and writes if they are directed to addresses that are recognized by the I/O device rather than memory. This strategy is called memory-mapped I/O because the devices appear to the processor as though they were part of the memory system. I/O access times are measured in milliseconds, whereas memory access times are usually less than a microsecond. It isn't productive to hold up the processor for thousands of instruction times while the I/O device does it job. Therefore, the control coupling between the processor and I/O devices is somewhat more complex than the memory interface.

Polling Versus Interrupts Because of the (relatively) long time to execute I/O operations, they are normally performed in parallel with CPU -processing. An I/O device often has its own controllers, essentially an in-de-pen-dent computer that handles the details of device control. The CPU asks the controller to perform an I/O operation, usually by writing in-for-ma-tion to memory-mapped control registers. The processor continues to exe-cute a stream of instructions while the I/O controller services its request.

The I/O controller notifies the CPU when its operation is complete. It can do this in two main ways: polling and interrupts. In polling, the I/O controller places its status in a memory-mapped register that the CPU can access. Every once in a while, the system software running on the CPU issues an instruction to examine the status register to see if the request is complete.

With interrupts, when the I/O operation is complete, the controller asserts a special control input to the CPU called the interrupt line. This forces the processor's state machine into a special interrupt state. The current state of the processor's registers, such as the PC and AC, is saved to special memory locations. The PC is overwritten with a distinguished address, where the system software's code for interrupt handling can be found. The instructions at this location handle the interrupt by copying data from the I/O device to memory where other programs can access it.

Polling is used in some very high performance computers that cannot afford to have their instruction sequencing disturbed by an I/O device's demand for attention. Interrupt-based I/O is used in almost all other computers, such as personal computers and time-sharing systems.

Changes to the Control State Diagram We need only modest changes to add interrupt support to the basic processor state diagram of Figure 11.2. Before fetching a new instruction, the processor checks to see whether an interrupt request is pending. If not, it continues with normal instruction fetch and execution.

If an interrupt has been requested, the processor simply enters its special interrupt state sequence. It saves the state of the machine, particularly the PC, and tells the I/O device through a standard handshake that it has seen the interrupt request. At this point, the machine returns to normal instruction fetch and execution, except that the PC now points to the first instruction of the system software's interrupt handler code.

A machine with interrupts usually provides a Return from Interrupt instruction. The system software executes this instruction at the end of its interrupt handling code, restoring the machine's saved state and returning control to the program that was running when the interrupt took place.

Real Computer Science begins where we almost stop reading ...