18-548/15-548 Fall 1998

Homework 2

Due Wednesday September 9, 1998

[CARTOON -- CMU ACCESS ONLY]

Problem 1:

1) You are a DARPA program manager and someone submits a proposal for a multi-chip module for Radar signal processing. In order to provide sufficient computational power you need to process a 2048 x 2048 (= 4Mega-data-element) array of 64-bit data elements once every 100 msec.

a) Draw a "plumbing diagram" for this system and label bandwidths for each piece of "pipe" under the following assumptions:

  1. There is one multi-chip module, with 8 identical CPUs bonded to it.
  2. Each CPU processes exactly one-eighth of the data array every 100 msec period, with no data shared among CPUs. (So, each CPU touches all words in its one-eighth the array, and only those words.) Assume instruction accesses have a 0% cache miss ratio.
  3. Each CPU has a cache with a 10% miss rate, and accesses each 8-byte word of data within its one-eight of the array 20 times per 100 msec interval.
  4. There is a 64-bit data bus going from the multi-chip module to main memory, which can sustain a transfer rate of one piece of 64-bit data every clock cycle, operating at 50 MHz.
  5. There are two banks of ideal memory (no inter-bank conflicts), each of which can complete a 64-bit word transfer every 20 ns.

b) What is the bottleneck to this system in terms of bandwidth, and how "big" should it be to just barely eliminate the bottleneck?

c) What is the maximum acceptable cache miss rate to eliminate the problem of the bandwidth bottleneck observed in part (b) of this question?


Problem 2:

This is an exploration of memory bandwidth versus latency. Assume that you have a processor which takes 3 clock cycles to access cache.


Problem 3:

Let's say that you have a choice between spending money on cache bandwidth or bus bandwidth. You must choose between the following two design options:

Design Option 1:
Off-chip cache memory access takes 4 clocks (by providing 256 pins for data)
Main memory access takes 24 clocks (by using a 32 bit data bus and cycling 4 times)
   
Design Option 2:
Off-chip cache memory access takes 6 clocks (by providing only 128 pins for data and cycling twice for each transfer)
Main memory access take 16 clocks (by using a 128 bit data bus)

18-548/15-548 home page.