### (Lec 18) Electrical Timing Issues: The Elmore Delay Model

### **▼** What you know...

- ▶ Lots of synthesis for logic and for geometry
- ▶ Ditto for verification--for logic
- ▶ Logical timing abstraction: Static timing analysis, topological delay

### **▼** What you don't know...

- ▶ How the geometric design of real, routed wires impacts delay
- ▶ Electrical timing abstraction
- ▶ We need to develop some usable notions of "delay" for use with layout algorithms: models simpler than a full simulation, but accurate enough

(Thanks to Larry Pileggi, for many cool slides & ideas here...)

© R. Rutenbar 2001,

CMU 18-760, Fall01 1

### **Copyright Notice**

### © Rob A. Rutenbar 2001 All rights reserved.

You may not make copies of this material in any form without my express permission.

© R. Rutenbar 2001,

### Where Are We?

**▼** For more accurate timing, need *electrical* wire delay estimation

|          | M  | Т  | W  | Th | F  |    |
|----------|----|----|----|----|----|----|
| Aug      | 27 | 28 | 29 | 30 | 31 | 1  |
| Sep      | 3  | 4  | 5  | 6  | 7  | 2  |
|          | 10 |    | 12 | 13 | 14 | 3  |
| Oct      | 17 | 18 | 19 | 20 | 21 | 4  |
|          | 24 | 25 | 26 | 27 | 28 | 5  |
|          |    | 2  | 3  | 4  | 5  | 6  |
|          | 8  | 9  | 10 | П  | 12 | 7  |
|          | 15 | 16 | 17 | 18 | 19 | 8  |
|          | 22 | 23 | 24 | 25 | 26 | 9  |
|          | 29 | 30 | 31 | I  | 2  | 10 |
| Nov      | 5  | 6  | 7  | 8  | 9  | П  |
|          | 12 | 13 | 14 | 15 | 16 | 12 |
| Thnxgive | 19 | 20 | 21 | 22 | 23 | 13 |
|          | 26 | 27 | 28 | 29 | 30 | 14 |
| Dec      | 3  | 4  | 5  | 6  | 7  | 15 |
|          | 10 | П  | 12 | 13 | 14 | 16 |
|          |    |    |    |    |    |    |

Introduction
Advanced Boolean algebra
JAVA Review
Formal verification
2-Level logic synthesis
Multi-level logic synthesis
Technology mapping
Placement
Routing
Static timing analysis
Electrical timing analysis
Geometric data structs & apps

© R. Rutenbar 2001,

CMU 18-760, Fall01 3

### Nominal Deadlines...

Last 760 lecture (probably...



HW5 6 PPT slide paper review

Proj 3 demos

- ...and, this is clearly a bit extreme for the last week of class
  - ▶ Open to suggestions for moving some deadlines BACK some...
  - ▶ ...but need to be careful not to mess up people with finals, early travel plans for break, etc

© R. Rutenbar 2001,

### **Timing Issues in Layout**

### **▼** What's the problem?

- ▶ Delays on signals due to wires no longer negligible
- ▶ Modern designs must meet tight timing specifications
- ▶ Layout tools must guarantee these timing specifications

### ■ How have we addressed this so far in layout?

- ▶ By ignoring it, mostly
- ► Implicitly, qualitatively
  - > We try to make layout area small
    - > We try to make clusters close together
    - > We try to make wires short
    - ▷ etc
    - > All these are good things, but not the same as a guarantee...

© R. Rutenbar 2001,

CMU 18-760, Fall01 5

# **Timing Issues: Impact of Interconnect**

### **▼** IC technology trends

### delay=15% delay=85%

### Mid 80s Scenario

Most of the input to output delay for I level of logic is due to gate delay

Wire delay is a very small component of the overall delay, ~18% here



### Mid 90s Scenario

Half of the input to output delay for I level of logic is due to wire delay



Today's Scenario (example bad case)

Most of the input to output delay for
I level of logic is due to wire delay

© R. Rutenbar 2001,

# **Timing Issues: Role of Layout Tools**

■ Unfortunately, easy for layout tools to screw up the timing properties that "upstream" tools try to achieve



### **▼** Upstream tools

- ...may have no real, physical models for the placement or routing
- ▶ Only have rough estimators to generate constraints on layout

© R. Rutenbar 2001,

CMU 18-760, Fall01 7

### **Basic Delay Modeling**

- **■** Let's focus in some detail on one important aspect of this overall timing optimization problem
- **▼** Interconnect delay
  - ▶ You do a placement, it puts the pins at a certain distance apart
  - ▶ So, you have to route a wire, it has an input-to-output delay
  - ▶ Where does the delay come from?
  - ▶ How accurately can we predict this delay?
  - ▶ How efficiently can we model this delay for use in layout tool?



# **Sources of Delay: Model 1**

- **■** Delay = finite speed signal propagation through physical wires
- $\blacksquare$  Model == Length
  - ▶ Delay proportional to length
  - ▶ Shorter = better
- **▼** Analysis
  - ▶ Pro: This is really easy, qualitatively OK
  - ▶ Con: Not quantitatively accurate, extremely crude



### Sources of Delay: Model 2

- **▼** Add: Delay also affected by *circuit drive* limitations
- Model == "Wire load"
  - ▶ Delay proportional to length, fanout, capacitance of the driven pins
  - ▶ Actually called "wire load models", usually model capacitance on a net
- **▼** Analysis
  - ▶ Pro: Qualitatively better
  - ▶ Con: Still focuses mostly on the pins, not on the wire; can be off by 3-5X



# **Sources of Delay: Model 3**

- Add: Delay comes from *parasitic loading* of the interconnect Depends critically on exact shape of the wired net
- **▼** Model == *Lumped Electrical Parameter* 
  - ▶ Interconnect must be modeled as a circuit, analyzed as a circuit
- **▼** Why?



Interconnect geometry is now large relative to the devices themselves

© R. Rutenbar 2001,

CMU 18-760, Fall01 11

### **Interconnect Models: RC Trees**

- **■** Let's see how to derive the most popular model used in layout applications for interconnect delay
- **▼** First: Interconnect -> Circuit



Metal wire has resistance = R to current flowing down its length



© R. Rutenbar 2001,









# **RC Trees**

- Recall a simple rule from basic circuits (or physics)
  - ▶ Parallel capacitors can be replaced by I cap with  $\Sigma$  C





Note: each of the Rs, Cs in this tree are probably different numbers, since each depends on geometry of the segment

© R. Rutenbar 2001,

CMU 18-760, Fall01 17

# **RC Trees**

- **▼ RC Tree general form** 
  - ▶ A tree of resistors (no loops)
  - ▶ Root of tree is where signal is input
  - ▶ Leaves of tree are the driven outputs
  - ▶ Capacitors to ground at all intermediate nodes of the tree



© R. Rutenbar 2001,





- Famous delay formula called the "Elmore" delay
  - ▶ Derived originally in the 40s for circuits applications
  - ▶ Resurrected in 80s by Penfield, Rubenstein, Horowitz for RC trees
  - ▶ Usually presented as a "magic formula" over the Rs and Cs...

### **▼** Our goal

- ▶ Give the basic delay result, and explain how it's calculated and used
- ▶ Apply the formula to a few illustrative examples
- ► [Aside: Show how to derive the basic result--briefly-- since it's the most useful formula in the performance-based layout business (appendix)]

© R. Rutenbar 2001,

CMU 18-760, Fall01 21

# **RC Trees: Labeling Convention**

### **▼** Observe

- ▶ We combine ("lump") load capacitance with I/2C from last segment
- ▶ In RC tree, each R and each C may be different
- ▶ Give each a name: Ri feeds into node i, Ci hangs off node i
- ► Label currents thru Ri as li



© R. Rutenbar 2001,





**▼** Common *patterns* of resistor values in all these eqns

- **▼** Can define some notation: R0k(i)
  - ▶ R0k(i) is the sum of resistors you see walking back up the tree from node "k" to the root, that are ALSO on the path from root to node i
  - ► Called "upstream resistance" for node "k"



# **RC Trees: Elmore Delay**

- More complex example of R0k(i)
  - ▶ Only R0 and R1 are on both paths: from root->4, and from root->3
  - ➤ Turns out the derivation focuses on paths the charging currents take from driver (root) to the individual leaf nodes (load caps)







# What Does Elmore Delay Try to Model?

- We want an accurate time constant "τ" for each output
  - ▶ Can depend only on the Rs, Cs we know from the RC tree
  - ▶ Different for each output--a unique feature for Elmore model



© R. Rutenbar 2001,

CMU 18-760, Fall01 29

# RC Trees: The Elmore Delay

**▼** This is the magic formula that we can derive

Vi(t) = V0(1 - e<sup>-t/
$$\tau$$</sup>) 
$$\tau = \sum_{\substack{\text{Nodes k} \\ \text{in RC tree}}} R0k^{\bullet}Ck$$

**▼**t is "the Elmore Delay"; recall:

▶ We asked this: what does this RC tree leaf voltage Vi(t) look like?

▶ We assumed this: apply  $V\theta$  step at t=0

► We also assumed: can model voltage Vi(t) as 1 time constant,  $1 - e^{-t/\tau}$ 

► Can derive this:  $\tau = \Sigma_k R0k \cdot Ck$ 

**▼** Note

▶ A general formula for the time constant for the response at any leaf

 $\blacktriangleright$  Assume one time constant  $\tau$  is a good approx for the actual delay

© R. Rutenbar 2001,

# **Observations**

### **Note**

- ▶ Basically says we can model the output at I leaf of an RC tree with an "equivalent circuit" that looks like I equivalent R, I eqv. C
- ▶ We don't really know the R or the C though, just that RC =  $\tau$
- ▶ Called a "one time constant" model (makes sense, eh?)

### **▼** Analysis

- ▶ PRO: Easy to compute (can do it recursively by walking tree)
- ▶ PRO: Gives you a unique delay for each output of the tree
- ▶ PRO: Accounts for all the parasitics Rs, Cs of the interconnect
- ► CON: It's still only a one time constant model; sometimes need > I

© R. Rutenbar 2001,

CMU 18-760, Fall01 31

CMU 18-760, Fall01 32

# **Trick to Compute Elmore Delay Fast**

### **▼** Do this:

- ▶ Set T = 0; start walking down tree to the leaf node (arrow)
- ▶ At each resistor, do  $\tau += \mathbf{R} \cdot \Sigma$  (all caps downstream)



Page 16

© R. Rutenbar 2001,

### Now What?

### **▼** The Elmore delay formulas are *immensely* useful

- ▶ SImple enough for layout folks to use them in algorithms
- ▶ Accurate enough that they beat simple length-based schemes
- ▶ (Unfortunately, not so accurate that you can avoid later verification with what are called "higher order" models that incorporate more than one time constant)

### **▼** Applications

► Let's look at a simple example and see how layout decisions affect actual delay, as measured with Elmore

© R. Rutenbar 2001,

CMU 18-760, Fall01 33

### **Elmore Example**

### **▼** Simple tree with 4 leaf nodes

- ▶ Normalized parameters: r = 1, c = 2
- ▶ Just assume that for a segment, total R = r L / W, C = c W L



© R. Rutenbar 2001,





# Elmore Example OK: what's the delay to each leaf? Since symmetric, only need to compute I path Remember the trick: 1. Set $\tau = 0$ , walk from root to leaf 2. At each resistor, do $\tau += R \cdot \Sigma$ (all caps downstream)







# **Elmore Applications**

- **■** Do people really use this delay metric?
  - Yes

### **▼** Verification

- ▶ It's easy to compute, gives a semi-real delay to each leaf node in an RC tree, allows us to see how wire "shape" affects per-leaf delay
- ▶ So, can use it for verification

### **▼** Synthesis (of layout)

- Since it is easy to see how length change of width change affect per-leaf delay, this becomes an optimizable "degree of freedom" in some apps
- ▶ Good example: clock trees

© R. Rutenbar 2001,

CMU 18-760, Fall01 41

# Clock Trees: ~Same Delay To Each Leaf

- **▼** Clock is huge global net (1000s of leaf nodes)
  - ► Each leaf is a latch, want ~same delay from root->latch; max(arrival time difference at latches) is called "skew", want this small

Size: 16,818 latches
Tech: 0.35 um
Freq: 200 MHz (T=5 ns)
Skew: 500 ps

Sample (1mm²) local distrib.

© R. Rutenbar 2001,





# **Delay Optimization Problem**

- **▼** Proper location of "tap" points to balance delay to sub-trees
  - ▶ You have 2 routed clock "subtrees". You want to connect them, so you route a wire between them.
  - ▶ But, where do you put the connection--the "tap" point--on this wire, so that delay down each each subtree is matched?





© R. Rutenbar 2001,



### This is a Geometric/Delay Optimization Task

- **■** Let us redraw for clarity
  - ▶ You already have 2 complete RC trees going down to latches
  - ▶ You have decided to "match" the local "roots" of these 2 trees
  - ▶ You will connect with a straight wire (you hope)
  - ▶ Problem: Where to put the tap point to equalize the Elmore delay on each side?





# **Exact Zero Skew**

- **▼** So what have we got?
  - ▶ Complete RC model for the 2 subtrees, and the connecting (match) wire
  - In terms of a variable x that we don't know, that tells us where to tap
  - ▶ Goal: Elmore delay down to left latch sites == Elmore delay to right



© R. Rutenbar 2001,

CMU 18-760, Fall01 49

# **Elmore Hacking**

- **▼** Recall
  - ▶ Delay (RC) from root to leaf in an RC tree was calculated like this:

delay(root->leaf) = 
$$\sum_{\substack{\text{nodes i} \\ \text{from root} \\ \text{to leaf}}} \text{Ri} \cdot (\sum \text{downstream capacitance} = \text{Cdi})$$

- **▼** Can also define delay from root to an internal node j
  - ▶ Delay (RC) from root to internal node j is similar:

delay(root -> j) = 
$$\sum_{\substack{\text{nodes i} \\ \text{from root} \\ \text{to j}}} \text{Ri} \cdot (\sum \text{downstream capacitance} = \text{Cdi})$$

© R. Rutenbar 2001,





### **Exact Zero Skew**

### ■ What do we want to accomplish here?

- ▶ Delay to the left = delay to the right
- ▶ So, we equate the 2 delays, and we get I equation in I unknown, x

$$xR(xC/2 + C1) + t1 = (1-x)R[(1-x)C/2 + C2)] + t2$$

▶ Can solve this analytically, get a unique x solution

$$x = \frac{(t2 - t1) + R[C2 + C/2)}{R(C + C1 + C2)}$$

© R. Rutenbar 2001,

CMU 18-760, Fall01 53

### **Exact Zero Skew**

### **▼** Interpretation

- ▶ Value of x tells us where to put the tap point on the matching wire
- ▶ If we put xL units of wire on left, (I-x)L on right, then Elmore delays balance -- assuming that Elmore delays inside each subtree, from subtree root to each leaf in each subtree, also balance
- ▶ Can get "exact zero skew" this way -- hence name of algorithm



### **Exact Zero Skew: One Complication... ■** You *want* x to come out $0 \le x \le 1$ ▶ But it might not…! ▶ Why not? If the trees are too unbalanced there IS NO tap point that will balance the Elmore delay! X > I will result X < 0 will result **RC** tree **RC** tree leaf nodes = latches leaf nodes = latches **RC** tree RC tree leaf nodes = latches leaf nodes = latches © R. Rutenbar 2001, CMU 18-760, Fall01 55











### **Exact Zero Skew**

- **■** Can similarly solve for when x>1...
  - ▶ Basically the same answer, with t1 and t2, C1 and C2 switched

### **■** Utility

- ▶ If you use a recursive, bottom up approach to geometrically route tree...
- ► Cool idea is: at every point where you make a wiring/tapping decision, you strive for perfectly balanced Elmore delay to both subtrees. Can solve analytically for this.
- ▶ If all the Elmore delays perfectly balanced, you get: Exact Zero Skew

© R. Rutenbar 2001,

CMU 18-760, Fall01 61

CMU 18-760, Fall01 62

### **Clock Balancing: By Wire Widening ▼** Picking right tap point, maybe adding wire is not *only* way ■ Alternative: wire widening widen wire on the "long" side, wider = less resistance = decreased delay on this side local root of RC tree local root on left **RC** tree of RC tree leaf nodes = latches RC tree on right leaf nodes = latches

© R. Rutenbar 2001,



### **Summary**

- **■** Interconnect increasingly responsible for chip speed
  - ▶ Technology is scaling to smaller sizes
  - ▶ Chips are being designed to run faster
- Layout tools responsible for part of timing guarantee
  - ▶ Upstream tools handle levels of logic, etc
  - ▶ Physical design tools responsible for partitioning, placement, routing
  - ▶ All of these impact wire length and distribution
- Individual wires modeled as complex circuits
  - ▶ From a layout view, RC tree is the nicest, most useful model
  - ▶ Elmore delay is easiest to compute delay estimator for I in->out
  - ▶ Can get the Elmore delay with a little very basic circuits
  - ▶ There are sophisticated estimators beyond Elmore...
  - ► Can use for both verification, and for layout optimizations (eg clock)

© R. Rutenbar 2001,

# Appendix: Why the Delay Trends?

### **▼** Qualitative answer

- ➤ Signals propagate through the physical materials of gates, wires with finite delay
- ▶ Wires, gates getting physically smaller, but interactions of the low-level technology parameters is complicated...



### **Deriving the Elmore Delay**

### **▼** From first principles

- ▶ Avoid complex linear system theoretic math
- ► Want to do this with plain old Kirchhoff laws and some basic circuit analysis, and some simple calculus

### **▼** Turns out to be not too hard

▶ Though it does turn on a few representation tricks for the algebra that are not obvious...

© R. Rutenbar 2001,



















- Suggests a change in strategy
  - ▶ Let's try to express everything interesting in the circuit using only combinations of the currents thru these capacitors
  - ► Let's call current thru Ck as Jk (and we know Jk = Ck•dVk/dt)
- **▼** Idea
  - ▶ Use superposition in the form of mesh analysis
  - ▶ Currents add up in each branch of the circuit



What's current thru cap C? JI-J2

What's KCL at top of C? JI - J2 - C\*dV/dt

© R. Rutenbar 2001,





- **■** What are these "sums of R's" on each J?
  - ► "Upstream" resistance on the unique path from root to V4 seen by the current Jk thru each capacitor Ck

▶ Define this as R0k; rewrite above as Vin -  $\sum_{k}$  R0k•Jk -V4 = 0



© R. Rutenbar 2001,

CMU 18-760, Fall01 79

# **RC Trees: Elmore Delay**

- **▼** Swell, but we still *don't* have V4(t)...
  - ▶ Replace Jk by Ck•dVk/dt

$$Vin(t) - \Sigma_k R0k \cdot Ck \cdot dVk/dt - V4(t) = 0$$

▶ Assume Vin(t) is a I V step applied at time = 0; rearrange

1 - V4(t) = 
$$\Sigma_k$$
 R0k•Ck•dVk/dt

- **▼** Problems
  - ▶ We don't know V4(t) -- it's what we want to solve for
  - ▶ We don't know all those C dV/dt derivatives at leaves either
  - ▶ We need a couple of tricks to get around these...

© R. Rutenbar 2001,

- **▼** Trick: what does V4(t) actually do, as a waveform?
  - ▶ Step back for a moment and think: what will V4(t) look like?
  - ▶ Answer: some exponential ramp rising from 0V to a IV asymptote
  - ▶ Why? The IV step input supplies current to charge capacitors in the RC tree; eventually they all charge up, current stops flowing, voltages become constant



© R. Rutenbar 2001,

CMU 18-760, Fall01 81

# **RC Trees: Elmore Delay**

- **▼** Recall: Apply a voltage step to a circuit with a capacitor...
  - ▶ Current starts to flow...
  - ▶ Eventually the cap charges up, and current stops flowing
  - ► Cap charges up to V0 here
  - ▶ Current I eventually goes to 0



- **▼** OK, but we have a whole *tree* of Rs and Cs...
- **▼** Trick: let's integrate both sides to get rid of those derivatives
  - ▶ Look at our expression for I V4(t)
  - ▶ Integrate it, from 0 to ∞





- **▼** Turns out this is enough for our needs
  - ▶ Let's assume that V4(t) follows an exponential rise, just like a circuit with a single R and a single C; let  $\tau = R \cdot C$  here.
  - ▶ So, we shall assume that

$$V4(t) = 1 - e^{-t/\tau}$$

▶ ..but we don't know  $\tau$ . But we do know the area above  $V4(\tau)!$ 



### **RC Trees: The Elmore Delay**

**▼** This is the magic formula that we want

V4(t) = 1 - e 
$$^{\text{-}\,\text{t/}\tau}$$
 
$$\tau = \Sigma_{\rm k}\,{\rm R0k}\text{-}{\rm Ck}$$

**▼**t is "the Elmore Delay"; recall:

▶ We asked this: what does this RC tree leaf voltage Vi(t) look like?

► We assumed this: apply N step at t=0

► We also assumed: can model voltage Vi(t) as 1 time constant,  $1 - e^{-\frac{t}{\tau}}$ 

► We derived this:  $\tau = \Sigma_k R0k \cdot Ck$ 

**Note** 

▶ A general formula for the time constant for the response at any leaf

▶ (Nothing in top eqn is really specific to node 4, except which resistors)

▶ Assume one time constant T is a good approx for the actual delay

© R. Rutenbar 2001, CMU 18-760, Fall01 86

# **Observations**

### **Note** ■

- ▶ Basically says we can model the output at I leaf of an RC tree with an "equivalent circuit" that looks like I equivalent R, I eqv. C
- ▶ We don't really know the R or the C though, just that RC =  $\tau$
- ▶ Called a "one time constant" model (makes sense, eh?)

### **▼** Analysis

- ▶ PRO: Easy to compute (can do it recursively by walking tree)
- ▶ PRO: Gives you a unique delay for each output of the tree
- ▶ PRO: Accounts for all the parasitics Rs, Cs of the interconnect
- ► CON: It's still only a one time constant model; sometimes need > I

© R. Rutenbar 2001,

CMU 18-760, Fall01 87

# Elmore Delay: Circuits Aside

- $\blacksquare$  That magic  $\tau$  is actually derivable several other ways
  - ▶ Recall that for any linear system (circuit) you can characterize it by it's impulse response, denoted h(t), which is what comes out when you put in a Dirac  $\delta(\tau)$



© R. Rutenbar 2001,

### **Elmore Delay: Circuits Aside**

- Turns out you can see more in frequency domain
  - ▶ Use the Laplace transform, which turns differential eqns into plain, old algebraic equations

$$\begin{split} F(s) &= \int \int_0^\infty f(t) \, e^{-st} \, dt \\ H(s) &= \int_0^\infty h(t) \, e^{-st} \, dt = \int_0^\infty h(t) \, [1 + (-st)/1! + (-st)^2/2! + ...] \, dt \\ &= \int_0^\infty h(t) dt + (-s) \int_0^\infty t \cdot h(t) \, dt + (-s)^2 \int_0^\infty t^2 \cdot h(t) \, dt + ... \\ \text{Oth moment of } h(t) & \text{of } h(t) & \text{of } h(t) \end{split}$$

© R. Rutenbar 2001,

**Elmore Delay: Circuits Aside** 

- Elmore delay uses the 1st moment of h(t) to approximate the response of the circuit to a voltage step applied at t=0
  - ▶ I moment gives you I time constant, so you follow I exp rise
- What happens if you want more accuracy?
  - ▶ You need to use more of these moments in your approximation
  - ▶ Technique called "moment matching"
  - Assumes you can get 'em, then "curve fit" a response waveform
  - ▶ Best known algorithms for doing it?
    - ▷ AWE: Asymptotic Waveform Eval., [Rohrer & Pillage TCAD90]
    - > Lots of follow-on work to this
    - ➤ You need to use some subtle circuits ideas to get more than the first moment, stuff beyond our self-imposed I=C•dV/dt limit

© R. Rutenbar 2001.

CMU 18-760, Fall01 90

# Circuit Aside: AWE Example

- **▼** Evaluation of clock signal network on DEC Alpha
  - ▶ 1st generation ALPHA chip, clock analyzed using AWE techniques
  - ► This allows us to get a more accurate delay than Elmore, using more than one time constant





Arrival time of clock (ps) as function of position on chip; Note clock driver is in chip center

© R. Rutenbar 2001,