Wednesday, 28 August 2013

Memory Technology and Introduction to Cache

Primary technology today: transistor RAM (text, section C.9)
static RAM (SRAM):
  • uses flip-flops, like a register file
  • non-destructive read-out
  • fast (down to 1 ns access time for small memories)
dynamic RAM (DRAM):
  • uses a single transistor to store each bit
  • simpler structure makes memory cheaper (now < 5 cents/MB)
  • and allows for larger capacity chips (now typically 1 - 2 Gb on a chip)
  • destructive read-out
  • requires regular refresh
  • slower (access time for random word (latency) as high as 50 ns)
Large DRAMs are organized as a 2-dimensional array;  bits in the same row can be accessed faster than random bits.

Recent DRAMs aim to improve streaming rate -- rate at which successive bytes can be read out -- through interfaces such as SDRAMs (synchronous DRAMs -- DRAMs operating synchronously with CPU clock) and DDR DRAMS (double data rate DRAMs). For some applications, memory does not have to change --- use ROM (read-only memory).
Static and dynamic RAM are both volatile: data disappears when power is lost. For some applications, data must be preserved, so a special (slow) non-volatile RAM is used.

Caches

Memory Hierarchy (Text, Section 5.1)

There is a trade-off between memory speed, cost, and capacity:
  • SRAM -- fast, expensive, small (smaller SRAMs are faster)
  • DRAM
  • disk -- slow, large, cheap
A cost effective system must have a mix of all of these memories, which means that the system must manage its data so that it is rapidly available when it is needed. In earlier machines, all of this memory management was done explicitly by the programmer; now more of it is done automatically and invisibly by the system. The ideal is to create a system with the cost and capacity of the cheapest technology along with the speed of the fastest.

Locality

If memory access was entirely random, automatic memory management would not be possible. Management relies on:
  • temporal locality: if a program referenced a location, it is more likely to reference this location again in the near future than it is to reference another random location
  • spatial locality: if a program referenced a location, it is more likely to reference nearby locations in the near future than it is to reference other random locations

Cache (Text, Section 5.2)

A cache is an (automatically-managed) level of memory between main memory and the CPU (registers). The goal with a cache is to get the speed of a small SRAM with the capacity of main memory. Each entry in the cache includes the data, the memory address (or a partial address, called the tag), and a valid bit.
One issue in cache design is cache addressing:  determining where a word from main memory may be placed in the cache (Fig. 5.13).
  • fully associative:
    • any word of memory can be stored into any word of the cache
    • not practical except for very small caches
    • if cache is full, evict least recently used entry in cache
  • direct mapped
    • if cache has N words, location k of memory goes into word (k mod N) of cache
    • simplest cache design (Fig. 5.7) (or see Prof. Gottlieb's diagrams)
    • conflict between memory locations with same (k mod N) reduces performance, as compared to fully associative cache
  • set associative
    • typical designs are 2-way or 4-way set associative
    • somewhat greater complexity than direct mapped (more comparators and multiplexers) (Fig. 5.17 or Prof. Gottlieb's diagrams)
    • for 2-way, organize cache as S=N/2 sets of 2 words each
    • location k of memory goes into set (k mod S) of cache
    • approaches performance of fully associative

No comments:

Post a Comment