### 18-344: Computer Systems and the Hardware-Software Interface Fall 2023



### **Course Description**

**Lecture 3: Computer Architecture Basics** 

This course covers the design and implementation of computer systems from the perspective of the hardware software interface. The purpose of this course is for students to understand the relationship between the operating system, software, and computer architecture. Students that complete the course will have learned operating system fundamentals, computer architecture fundamentals, compilation to hardware abstractions, and how software actually executes from the perspective of the hardware software/boundary. The course will focus especially on understanding the relationships between software and hardware, and how those relationships influence the design of a computer system's software and hardware. The course will convey these topics through a series of practical, implementation-oriented lab assignments.

Credit: Brandon Lucia

#### What did we talk about last time?

- Hardware vs. software tradeoffs
- von Neumann vs. Harvard architecture and the beginnings of a design space
- An optimization exercise by example
- Amdahl's Law (and Gustafson's Law, by contrast)

## Hardware/software boundary



```
int walk_page_range(struct mm_struct *mm, unsigned long start,
                unsigned long end, const struct mm_walk_ops *ops,
                void *private)
       int err = 0;
        unsigned long next;
       struct vm_area_struct *vma;
        struct mm_walk walk = {
                .private
                               = private,
       if (start >= end)
                return -EINVAL;
       if (!walk.mm)
                return -EINVAL;
       mmap_assert_locked(walk.mm);
        vma = find vma(walk.mm, start);
               if (!vma) { /* after the last vma */
                        walk.vma = NULL;
```

## Our first hw/sw interface: The Von Neumann Computing Model



John von Neumann's Big Idea:

Programs are data.



### Optimizing our Harvard Architecture



### How about changing the code?



### Amdahl's Law



100% of execution time



#### Amdahl's Law:

optimized time = [1-p x time / 1.0] + [p x time / speedup]

Or equivalently:

speedup = 1/[(1-p)/1.0 + p/speedup]

5% - Floating Point



### Another view of the world: Gustaffson's Law



Idea: find an *optimizable* part of your system and make it *bigger* If we know that memory is optimizable, why not optimize more and do more memory accesses?

### Another view of the world: Gustafson's Law

Gustafson's Law: Sequential part does not grow as optimizable part grows. Can always add more optimizable part and make sequential part matter less

Assume that we can scale up # of parallel mem operations, N
Assume that we can scale input to use all N parallel memops

```
data_size = 10
data[data_size] = {...}
if(...) {
    ...//18 more of these conditionals
if(...) {
}
for d in 0..data_size{ d++ }
```

```
data_size = 100000
data[data_size] = {...}
if(...) {
    ...//18 more of these conditionals
if(...) {
    #parallel[N=1000]
for d in 0..data_size{ d++ }
```

### Another view of the world: Gustafson's Law

85% - Memory Accesses

## Gustafson's Law for overall speedup with speedup factor of N: (assume) Optimized time = T = 1Unoptimized time = T' = (1-p)T + pT\*N = (1-p) + pNScaled Speedup = T' / T = (1-p) + pN

### Another view of the world: Gustafson's Law

#### Scale parallel memory accesses, N, up to 1000?

```
Scaled Speedup = 1-p + 1000p = 999p + 1
Scaled Speedup = 999 * 0.85 + 1 = 850x
```

85% - Memory Accesses

## Gustafson's Law for overall speedup with speedup factor of N: (assume) Optimized time = T = 1Unoptimized time = T' = (1-p)T + pT\*N = (1-p) + pNScaled Speedup = T' / T = (1-p) + pN

## What is a Computer Architecture?

- Building up to our first architecture
- Defining the ISA: Architecture vs. Microarchitecture
- RISC vs. CISC ISAs
- RISCV ISA

## Our CPU from last time is incomplete

**CPU** 

Control & Instruction Sequencing ("Control")

Arithmetic, Logic, and Data Manipulation ("ALU")

## Basic Architecture: State + processing elements



### Building up to our first architecture: ALU



### Building up to our first architecture: ALU



Design choice – what operations do we support here?

# Basic Architecture: State + processing elements



# Basic Architecture: State + processing elements



# Building up to our first architecture: ALU + Registers



# Building up to our first architecture: ALU + Registers



Stateful Elements plus control required to access them, providing inputs to operations and storing outputs of operations

# Building up to our first architecture: ALU + Registers



Registers are **named & explicit**. Implication of explicit names?

**Design choice** – how many stateful elements / registers do we support?



**Instruction** gets decoded into signals that control the other parts of the system (more on encoding / decoding in a few slides)



**Instruction memory** holds all of the bits of all of the instructions that we might ever use to control other units.

**Design choice:** Need to think about where we put this memory (and its hierarchy of caches)



**Instruction fetch logic** refers to PC, loads instruction from instruction memory and sends to decode.

**Design choices:** how much to fetch at once? What to fetch next (not always obvious)?



Remember our fetch optimization from last time? That would go here. Specialized instruction memory access logic. (Physical memory may be the same, though)



#### **Sequential Control:**

Each cycle, update the PC by adding 4.

Implication for software of our current design?



Key Idea: What we encode here has implications for other units and software layers above the instruction definition level.

**Mechanism** of decoding and **content** of encoded/decoded instructions are orthogonal concepts. **How?** vs **what?.** 















### Building up to our first architecture: Branching











# A Complete (but slightly messy) RISCV-ish Datapath



# A "single-cycle" design





# A "single-cycle" design





## Thinking about latency: ALU Operations



## Thinking about latency: ALU Operations



### Thinking about latency: Memory



## Thinking about latency: Memory



### Implication of operation latencies?

 Single-cycle design means that the cycle time for the system is defined by the latency of the longest-latency operation

 In our case, that would be the memory latency (and ALU latency has some slack from the cycle time)

 If every operation is not a memory operation, then we have overprovisioned the cycle time of the system

# Where is the HW/SW Interface in the Datapath?



## Where is the HW/SW Interface?



# Instruction memory holds software



### Big Idea: Instruction Bits are Control Signals



### Big Idea: Instruction Bits are Control Signals



## Big Idea: Instruction Bits are Control Signals



### Instruction Set Architecture

The ISA defines the **architecture** of the machine



The ISA defines the **architecture** of the machine

A **microarchitecture** implements the features of the architecture



The ISA defines the **architecture** of the machine

A **microarchitecture** implements the features of the architecture



For a given architecture there are **many** perfectly good microarchitectural implementations



For a given architecture there are **many** perfectly good microarchitectural implementations

#### **Architecture:**

Sequentially-numbered, general-purpose registers



#### Microarchitecture:

Two SRAM banks storing regs based on parity

### Instruction Set Architecture

The ISA is the **vocabulary** of the machine



# What should go in the ISA?

### **Reduced Instruction Set Computer**

#### Simple primitives:

Let software compose complex operations

#### Register operands:

Decouple functionality from memory accesses

#### Few total operations:

Usually only one way to do something

### **Complex Instruction Set Computer**

#### Simple & complex operations:

Hardware provides complex functionality

#### Many operations:

Often several ways to do the same thing

#### Register and memory operands:

Operations may directly manipulate memory







# What should go in the ISA?

### **Reduced Instruction Set Computer**

#### Simple primitives:

Let software compose complex operations

#### Register operands:

Decouple functionality from memory accesses

#### Few total operations:

Usually only one way to do something

```
rd = M[imm] in microarchitecture
rd = M[reg]
rd = M[reg + imm]
rd = M[PC + imm]
```

### **Complex Instruction Set Computer**

#### Simple & complex operations:

Hardware must support complex functionality

#### Many operations:

Often several ways to do the same thing

#### Register and memory operands:

Operations may directly manipulate memory



Plus all of these combinations

D(Rb,Ri,S) Mem[Reg[Rb]+S\*Reg[Ri]+D]

# What should go in the ISA?

### Reduced Instruction Set Computer

**Complex Instruction Set Computer** 

Simple primitives:

Let software compose complex operations

Register operands:

Decouple functionality from memory accesses

Few total operations:

Usually only one way to do something

Simple & complex operations:

Hardware must support complex functionality

Register and memory operands:

Operations may directly manipulate memory

**Many operations:** 

Often several ways to do the same thing

What are the pros and cons of each?

How does RISC vs. CISC affect the microarchitecture, compiler, program, programmer?

## Principles of ISA Design

### **General Principles**

Regularity – "Law of least astonishment"

Orthogonality – keep separable concerns separate

Composability – regular, orthogonal ops combine easily

### **Specific Principles**

One vs. All – precisely one way to do it, or all ways should be possible Primitives, not solutions – solve by coding, compiling, & synthesizing

### "Blatant opinions" (matters of taste)

Addressing – not limited to simple arrays, etc.

Environment Support – exceptions, processes, debugging, etc

Deviations – deviate from these rules only in implementation-specific ways

Designing irregular structures at the chip level is *very* expensive.

Some architectures have provided direct implementations of high-level concepts. In many cases these turn out to be more trouble than they are worth.



### What did we just learn?

- Computer architectures define the HW/SW interface through the ISA
- There is a difference between architecture and microarchitecture
- Many valid microarch. implementations of an architecture exist
- RISC vs. CISC architectures are extrema on a spectrum
- Principles of ISA design (Wulf)

### What to think about next?

- The basics of the RISCV-RV32I ISA and some other hw/sw interfaces
- More microarchitectural concepts
  - Pipelining our microarchitecture & instruction-level parallelism
  - Control hazards & branch prediction