4.1. The Central Processor - Control and Dataflow
Figure 4.1. Schematic diagram of a modern von Neumann processor, where the CPU is denoted by a shaded box -adapted from [Maf01].
- Processor (CPU) is the active part of the computer, which does all the work of data manipulation and decision making.
- Datapath is the hardware that performs all the required operations, for example, ALU, registers, and internal buses.
- Control is the hardware that tells the datapath what to do, in terms of switching, operation selection, data movement between ALU components, etc.
Figure 4.2. Schematic diagram of the processor in Figure 4.1, adapted from [Maf01].
Figure 4.3. Schematic diagram of MIPS architecture from an implementational perspective, adapted from [Maf01].
4.1.2. Register File
Figure 4.4. Register file (a) block diagram, (b) implementation of two read ports, and (c) implementation of write port - adapted from [Maf01].
4.2. Datapath Design and ImplementationReading Assignments and Exercises
Figure 4.5. Schematic high-level diagram of MIPS datapath from an implementational perspective, adapted from [Maf01].
Figure 4.6. Schematic diagram of Data Memory and Sign Extender, adapted from [Maf01].
4.2.1. R-format Datapath
Figure 4.7. Schematic diagram R-format instruction datapath, adapted from [Maf01].
4.2.2. Load/Store Datapath
lw $t1, offset($t2), where offset denotes a memory address offset applied to the base address in register
lwinstruction reads from memory and writes into register
swinstruction reads from register
$t1and writes into memory. In order to compute the memory address, the MIPS ISA specification says that we have to sign-extend the 16-bit offset to a 32-bit signed value. This is done using the sign extender shown in Figure 4.6.
- Register Access takes input from the register file, to implement the instruction, data, or address fetch step of the fetch-decode-execute cycle.
- Memory Address Calculation decodes the base address and offset, combining them to produce the actual memory address. This step uses the sign extender and ALU.
- Read/Write from Memory takes data or instructions from the data memory, and implements the first part of the execute step of the fetch/decode/execute cycle.
- Write into Register File puts data or instructions into the data memory, implementing the second part of the execute step of the fetch/decode/execute cycle.
Figure 4.8. Schematic diagram of the Load/Store instruction datapath. Note that the execute step also includes writing of data back to the register file, which is not shown in the figure, for simplicity [MK98].
4.2.3. Branch/Jump Datapath
beq $t1, $t2, offset, where offset is a 16-bit offset for computing the branch target address via PC-relative addressing. The
beqinstruction reads from registers
$t2, then compares the data obtained from these registers to see if they are equal. If equal, the branch is taken. Otherwise, the branch is not taken.
- Register Access takes input from the register file, to implement the instruction fetch or data fetch step of the fetch-decode-execute cycle.
- Calculate Branch Target - Concurrent with ALU #1's evaluation of the branch condition, ALU #2 calculates the branch target address, to be ready for the branch if it is taken. This completes the decode step of the fetch-decode-execute cycle.
- Evaluate Branch Condition and Jump to BTA or PC+4 uses ALU #1 in Figure 4.9, to determine whether or not the branch should be taken. Jump to BTA or PC+4 uses control logic hardware to transfer control to the instruction referenced by the branch target address. This effectively changes the PC to the branch target address, and completes the execute step of the fetch-decode-execute cycle.
Figure 4.9. Schematic diagram of the Branch instruction datapath. Note that, unlike the Load/Store datapath, the execute step does not include writing of results back to the register file [MK98].
4.3. Single-Cycle and Multicycle Datapaths
4.3.2. Single Datapaths
- The second ALU input is a register (R-format instruction) or a signed-extended lower 16 bits of the instruction (e.g., a load/store offset).
- The value written to the register file is obtained from the ALU (R-format instruction) or memory (load/store instruction).
Figure 4.10. Schematic diagram of a composite datapath for R-format and load/store instructions [MK98].
Figure 4.11. Schematic diagram of a composite datapath for R-format, load/store, and branch instructions [MK98].
ALU Control Input Function ------------------ ------------ 000 and 001 or 010 add 110 sub 111 slt
ALUop). The ALUop signal denotes whether the operation should be one of the following:
ALUop Input Operation ------------- ------------- 00 load/store 01 beq 10 determined by opcode
- Bits 31-26: opcode - always at this location
- Bits 25-21 and 20-16: input register indices - always at this location
- Bits 25-21: base register for load/store instruction - always at this location
- Bits 15-0: 16-bit offset for branch instruction - always at this location
- Bits 15-11: destination register for R-format instruction - always at this location
- Bits 20-16: destination register for load/store instruction - always at this location
Figure 4.12. Schematic diagram of composite datapath for R-format, load/store, and branch instructions (from Figure 4.11) with control signals and extra multiplexer for WriteReg signal generation [MK98].
- RegDst Deasserted: Register destination number for the Write register is taken from bits 20-16 (rt field) of the instructionAsserted: Register destination number for the Write register is taken from bits 15-11 (rd field) of the instruction
- RegWriteDeasserted: No actionAsserted: Register on the WriteRegister input is written with the value on the WriteData input
- ALUSrcDeasserted: The second ALU operand is taken from the second register file output (ReadData 2)Asserted: the second alu operand is the sign-extended, lower 16 bits of the instruction
- PCSrcDeasserted: PC is overwritten by the output of the adder (PC + 4)Asserted: PC overwritten by the branch target address
- MemReadDeasserted: No actionAsserted: Data memory contents designated by address input are present at the ReadData output
- MemWriteDeasserted: No actionAsserted: Data memory contents designated by address input are present at the WriteData input
- RegWriteDeasserted: The value present at the WriteData input is output from the ALUAsserted: The value present at the register WriteData input is taken from data memory
beqand the Zero output of the ALu used for comparison is true. PCSrc is generated by and-ing a Branch signal from the control unit with the Zero signal from the ALU. Thus, all control signals can be set based on the opcode bits. The resultant datapath and its signals are shown in detail in Figure 4.13.
Figure 4.13. Schematic diagram of composite datapath for R-format, load/store, and branch instructions (from Figure 4.12) with control signals illustrated in detail [MK98].
4.3.2. Datapath Operation
add $t1, $t0, $t1) using the datapath developed in Section 4.3.1 involves the following steps:
- Fetch instruction from instruction memory and increment PC
- Input registers (e.g.,
$t1) are read from the register file
- ALU operates on data from register file using the funct field of the MIPS instruction (Bits 5-0) to help select the ALU operation
- Result from ALU written into register file using bits 15-11 of instruction to select the destination register (e.g.,
lw $t1, offset($t2)) using the datapath developed in Section 4.3.1 involves the following steps:
- Fetch instruction from instruction memory and increment PC
- Read register value (e.g., base address in
$t2) from the register file
- ALU adds the base address from register
$t2to the sign-extended lower 16 bits of the instruction (i.e.,
- Result from ALU is applied as an address to the data memory
- Data retrieved from the memory unit is written into the register file, where the register index is given by
$t1(Bits 20-16 of the instruction).
beq $t1, $t2, offset) using the datapath developed in Section 4.3.1 involves the following steps:
- Fetch instruction from instruction memory and increment PC
- Read registers (e.g.,
$t2) from the register file. The adder sums PC + 4 plus sign-extended lower 16 bits of
offsetshifted left by two bits, thereby producing the branch target address (BTA).
- ALU subtracts contents of
$t1minus contents of
$t2. The Zero output of the ALU directs which result (PC+4 or BTA) to write as the new PC.
4.3.3. Extended Control for New Instructions
- Bits 31-28: Upper four bits of (PC + 4)
- Bits 27-02: Immediate field of jump instruction
- Bits 01-00: Zero (002)
- An additional multiplexer, to select the source for the new PC value. To cover all cases, this source is PC+4, the conditional BTA, or the JTA.
- An additional control signal for the new multiplexer, asserted only for a jump instruction (opcode = 2).
Figure 4.14. Schematic diagram of composite datapath for R-format, load/store, branch, and jump instructions, with control signals labelled [MK98].
4.3.4. Limitations of the Single-Cycle Datapath
4.3.5. Multicycle Datapath Design
- Each functional unit (e.g., Register File, Data Memory, ALU) can be used more than once in the course of executing an instruction, which saves hardware (and, thus, reduces cost); and
- Each instruction step takes one cycle, so different instructions have different execution times. In contrast, the single-cycle datapath that we designed previously required every instruction to take one cycle, so all the instructions move at the speed of the slowest.
- In the multicycle datapath, one memory unit stores both instructions and data, whereas the single-cycle datapath requires separate instruction and data memories.
- The multicycle datapath uses on ALU, versus an ALU and two adders in the single-cycle datapath, because signals can be rerouted throuh the ALU in a multicycle implementation.
- In the single-cycle implementation, the instruction executes in one cycle (by design) and the outputs of all functional units must stabilize within one cycle. In contrast, the multicycle implementation uses one or more registers to temporarily store (buffer) the ALU or functional unit outputs. This buffering action stores a value in a temporary register until it is needed or used in a subsequent clock cycle.
Figure 4.15. Simple multicycle datapath with buffering registers (Instruction register, Memory data register, A, B, and ALUout) [MK98].
- Programmer-Visible (register file, PC, or memory), in which data is stored that is used by subsequent instructions (in a later clock cycle); and
- Additional State Elements(buffer registers), in which data is stored that is used in a later clock cycle of the same instruction.
- Memory access
- Register file access (two reads or one write)
- ALU operation (arithmetic or logical)
- Instruction Register (IR) saves the data output from the Text Segment of memory for a subsequent instruction read;
- Memory Data Register (MDR) saves memory output for a data read operation;
- A and B Registers (A,B) store ALU operand values read from the register file; and
- ALU Output Register (ALUout) contains the result produced by the ALU.
- Add a multiplexer to the first ALU input, to choose between (a) the A register as input (for R- and I-format instructions) , or (b) the PC as input (for branch instructions).
- On the second ALU, the input is selected by a four-way mux (two control bits). The two additional inputs to the mux are (a) the immediate (constant) value 4 for incrementing the PC and (b) the sign-extended offset, shifted two bits to preserve alighment, which is used in computing the branch target address.
- Write Control Signals for the IR and programmer-visible state units
- Read Control Signal for the memory; and
- Control Lines for the muxes.
- ALU output = PC + 4, to get the next instruction during the instruction fetch step (to do this, PC + 4 is written directly to the PC)
- Register ALUout, which stores the computed branch target address.
- Lower 26 bits (offset) of the IR, shifted left by two bits (to preserve alginment) and concatenated with the upper four bits of PC+4, to form the jump target address.
beqnstruction are equal and (b) the result of (ALUZero and PCWriteCond) determines whether the PC should be written during a conditional branch. We call the latter the branch taken condition. Figure 4.16 shows the resultant multicycle datapath and control unit with new muxes and corresponding control signals. Table 4.4 illustrates the control signals and their functions.
4.3.6. Multicycle Datapath and Instruction Execution
- ALU operation
- Register file access (two reads or one write)
- Memory access (one read or one write)
Figure 4.16. MIPS multicycle datapath [MK98].
IR = Memory[PC] # Put contents of Memory[PC] in Instr.Register PC = PC + 4 # Increment the PC by 4 to preserve alignment
- Applicable to all instructions and
- Not harmful to any instruction.
A = RegFile[IR[25:21]] # First operand = Bits 25-21 of instruction B = RegFile[IR[20:16]] # Second operand = Bits 25-21 of instruction ALUout = PC + SignExtend(IR[15:0]) << 2 ; # Compute BTA
- Memory Reference:
ALUout = A + SignExtend(IR[15:0])The ALU constructs the memory address from the base address (stored in A) and the offset (taken from the low 16 bits of the IR). Control signals are set as described on p. 387 opf the textbook.
- R-format Instruction:
ALUout = A op BThe ALU takes its inputs from buffer registers A and B and computes a result according to control signals specified by the instruction opcode, function field, and control signals
ALUop = 10. The control signals are further described on p. 387 of the textbook.
if (A == B) then PC = ALUoutIn branch instructions, the ALU performs the comparison between the contents of registers A and B. If A = B, then the Zero output of the ALU is asserted, the PC is updated (overwritten) with (1) the BTA computed in the preceding step (per Section 184.108.40.206), then (2) the ALUout value. If the branch is not taken, then the PC+4 value computed during instruction fetch (per Section 220.127.116.11) is used. This covers all possibilities by using for the BTA the value most recently written into the PC. Salient hardware control actions are discussed on p. 387 of the textbook.
PC = PC[31:28] || (IR[25:0] << 2)Here, the PC is replaced by the jump target address, which does not need the ALU be computed, but can be formed in hardware as described on p. 387 of the textbook.
MDR = Memory[ALUout] # Load Memory[ALUout] = B # Store
Reg[IR[15:11]] = ALUout # Write ALU result to register file
4.4. Finite State Control
4.4.1. Finite State Machine
- Current state and inputs;
- Next-state function, also called the transition function, which converts inputs to (a) a new state, and (b) outputs of the FSM; and
- Outputs, which in the case of the multicycle datapath, are control signals that are asserted when the FSM is in a given state.
4.4.2. Finite State Control
- Instruction fetch
- Instruction decode and data fetch
- ALU operation
- Memory access or R-format instruction completion
- Memory access completion
Figure 4.17. High-level (abstract) representation of finite-state machine for the multicycle datapath finite-state control. Figure numbers refer to figures in the textbook [Pat98,MK98].
Figure 4.18. Representation of finite-state control for the instruction fetch and decode states of the multicycle datapath. Figure numbers refer to figures in the textbook [Pat98,MK98].
- State 3: Performs memory access by asserting the MemRead signal, putting memory output into the MDR.
- State 5: Activated if
sw(store word) instruction is used, and MemWrite is asserted.
Figure 4.19. Representation of finite-state control for the memory reference states of the multicycle datapath. Figure numbers refer to figures in the textbook [Pat98,MK98].
Figure 4.20. Representation of finite-state control for the R-format instruction execution states of the multicycle datapath. Figure numbers refer to figures in the textbook [Pat98,MK98].
beqinstruction can be implemented this way.
Figure 4.21. Representation of finite-state control for (a) branch and (b) jump instruction-specific states of the multicycle datapath. Figure numbers refer to figures in the textbook [Pat98,MK98].
4.4.3. FSC and Multicycle Datapath Performance
Figure 4.22. Representation of the composite finite-state control for the MIPS multicycle datapath [MK98].
- Load: 5 states
- Store: 4 states
- R-format ALU instructions: 4 states
- Branch: 3 states
- Jump: 3 states
gccbenchmark is 4.02, a savings of approximately 20 percent over the worst-case CPI (equal to 5 cycles for all instructions, based the single-cycle datapath design constraint that all instructions run at the speed of the slowest).
4.4.4. Implementation of Finite-State Control
4.5. Microprogrammed Control
- Microinstruction Format that formalizes the structure and content of the microinstruction fields and functionality;
- Sequencing Mechanism, which determines whether the next instruction, or one indicated by a branch control structure, will be executed; and
- Exception Handling that determines what actions control should take when an error occurs (e.g., arithmetic overflow).
4.5.1. Microinstruction Format
|Field Name||Field Function|
|ALU control||Specify the operation performed by the ALU during this clock cycle, the result written to ALUout.|
|SRC1||Source for the first ALU operand|
|SRC2||Source for the second ALU operand|
|Register control||Specify read or write for Register File, as well as the source of a value to be written to the register file if write is enabled.|
|Memory||Specify read or write, and the source for a write. For a read, specify the destination register.|
|PCWrite control||Specify how the PC is to be written (e.g., PC+4, BTA, or JTA)|
|Sequencing||Specify how to choose the next microinstruction for execution|
- Incrementation, by which the address of the current microinstruction is incremented to obtain the address of the next microinstruction. Thsi is indicated by the value Seq in the Sequencing field of Table 4.5.
- Branching, to the microinstruction that initiates execution of the next MIPS instruction. This is implemented by the value Fetch in the Sequencing field.
- Control-directed choice, where the next microinstruction is chosen based on control input. We call this operation a dispatch. This is implemented by one or more address tables (similar to a jump table) called displatch tables. The hardware implementation of dispatch tables is discussed in Section C.5 (Appendix C) of the textbook.In the current subset of MIPS whose multicycle datapath we have been implementing, we need two dispatch tables, one each for State 1 and State 2. The use of a dispatch table numbered i is indicated in the microinstruction by putting Dispatch i in the Sequencing field.
Field Name Values for Field Field Functionality ----------------- ---------------- ------------------------------------------------------------ Label Any string Labels control sequencing, per p. 403 of the textbook ALU control Add ALU performs addition operation Subt ALU performs subtraction operation Func code Instruction's funct field determines ALU operation SRC1 PC The PC is the first ALU input A Buffer register A is the first ALU input SRC2 B Buffer register B is the second ALU input 4 The constant 4 is the second ALU input (for PC+4) Extend Output of sign extension module is second ALU input Extshft Sign-extended output of two-bit shifter is second ALU input Register Control Read Read two registers using rs and rt fields of the current instruction, putting data into buffers A and B Write ALU Write to the register file using the rd field of the instruction register as the register number and the contents of ALUout as the data Write MDR Write to the register file using the rd field of the instruction register as the register number and the contents of the MDR as the data Memory Read PC Read memory using the PC as the memory address, writing the result into the IR and MDR [implements instruction fetch] Read ALU Read memory using ALUout as the address, write the result into MDR Write ALU Write to memory using the ALUout contents as the address, writing to memory the data contained in buffer register B PCWrite control ALU Write the output of the ALU into the PC register ALUout-cond If the ALU's Zero output is high, write the contents of ALUout into the PC register Jump address Write the PC with the jump address from the instruction Sequencing Seq Choose the next microinstruction sequentially Fetch Got to the first microinstruction to begin a new MIPS instruction Dispatch i Dispatch using the ROM specified by i (where i = 1 or 2)
4.5.2. Microprogramming the Datapath Control
- A field that controls a functional unit (e.g., ALU, register file, memory) or causes state information to be written (e.g., ALU dest field), when blank, implies that no control signals should be asserted.
- A field that only specifies control of an input multiplexer for a functional unit, when left blank, implies that the datapath does not care about what value the output of the mux has.
Label ALU control SRC1 SRC2 Register control Memory PCWrite Sequencing ----- ------------- ------ -------- ------------------- -------- --------- ------------ Fetch Add PC 4 --- Read PC ALU Seq --- Add PC Extshft Read --- --- Dispatch 1
- ALU control, SRC1, and SRC2 are set to compute PC+4, which is written to ALUout. The memory field reads the instruction at address equal to PC, and stores the instruction in the IR. The PCWrite control causes the ALU output (PC + 4) to be written into the PC, while the Sequencing field tells control to go to the next microinstruction.
- The label field (value = fetch) will be used to transfer control in the next Sequencing field when execution of the next instruction begins.
- ALU control, SRC1, and SRC2 are set to store the PC plus the sign-extended, shifted IR[15:0] into ALUout. Register control causes data referenced by the rs and rt fields to be placed in ALU input registers A and B. output (PC + 4) to be written into the PC, while the Sequencing field tells control to go to dispatch table 1 for the next microinstruction address.
Label ALU control SRC1 SRC2 Register control Memory PCWrite Sequencing ----- ------------- ------ -------- ------------------- -------- --------- ------------ Mem1 Add A Extend --- --- --- Dispatch 2 LW2 --- --- --- --- Read ALU --- Seq --- --- --- --- Write MDR --- --- Fetch
Label ALU control SRC1 SRC2 Register control Memory PCWrite Sequencing ----- ------------- ------ -------- ------------------- -------- --------- ------------ Rformat1 Func code A B --- --- --- Seq --- --- --- --- Write ALU --- --- Fetch
Label ALU control SRC1 SRC2 Register control Memory PCWrite Sequencing ----- ------------- ------ -------- ------------------- -------- --------- ------------ Beq1 Subt A B --- --- ALUout-cond Fetch
Label ALU control SRC1 SRC2 Register control Memory PCWrite Sequencing ----- ------------- ------ -------- ------------------- -------- --------- ------------ Jump1 --- --- --- --- --- Jump address Fetch
Label ALU control SRC1 SRC2 Register control Memory PCWrite Sequencing ----- ------------- ------ -------- ------------------- -------- --------- ------------ Fetch Add PC 4 --- Read PC ALU Seq --- Add PC Extshft Read --- --- Dispatch 1 Mem1 Add A Extend --- --- --- Dispatch 2 LW2 --- --- --- --- Read ALU --- Seq --- --- --- --- Write MDR --- --- Fetch SW2 --- --- --- --- Write ALU --- Fetch Rformat1 Func code A B --- --- --- Seq --- --- --- --- Write ALU --- --- Fetch Beq1 Subt A B --- --- ALUout-cond Fetch Jump1 --- --- --- --- --- Jump address Fetch
4.5.3. Implementing a Microprogram
4.5.4. Exception Handling
- An exception is an anomalous event arising from within the processor, such as arithmetic overflow.
- An interrupt is an event that causes an unexpected change in control flow. Interrupts are assumed to originate outside the processor, for example, an I/O request.
- EPC: 32-bit register holds the address of the exception-causing instruction, and
- Cause: 32-bit register contains a binary code that describes the cause or type of exception.
CauseWrite, which write the appropriate information to the EPC and Cause registers. Also required in this particular implementation is a 1-bit signal to set the LSB of Cause to be 0 for an undefined instruction, or 1 for arithmetic overflow. Of further use is an address AE that points to the exception handling routine to which control is transferred. In MIPS, we assume that AE = C000000016.
Figure 4.23. Representation of the composite datapath architecture and control for the MIPS multicycle datapath, with provision for exception handling [MK98].
- Undefined Instruction: Finite state control must be modifed to define the next-state value as 10 (the eleventh state of our control FSM) for all operation types other than the five that are allowed (i.e., lw, sw, beg, jump, and R-format). In the FSM diagram of Figure 4.25, this is shown as other.
- Arithmetic Overflow: Recall that an ALU can be designed to include overflow detection logic with a signal output from the ALU called overflow, which is asserted if overflow is detected. This is used to specify the next state for State 7 in the FSM of Figure 4.25.
Figure 4.24. Representation of the finite-state models for two types of exceptions in the MIPS multicycle datapath [MK98].
Figure 4.25. Representation of the composite finite-state control for the MIPS multicycle datapath, including exception handling [MK98].