

Each stored row of the page table is called a **page table entry** (the grayed section is the first page table entry). The page table is stored *in memory*; the OS sets a register telling the hardware the address of the first entry of the page table. The processor updates the "page dirty" in the page table: "page dirty" bits are used by the OS to know whether updating a page on disk is necessary. Each process gets its own page table.

- Protection Fault--The page table entry for a virtual page has permission bits that prohibit the requested operation
- Page Fault--The page table entry for a virtual page has its valid bit set to false. The entry is not in memory.

## The Translation Lookaside Buffer (TLB)

A cache for the page table. Each block is a single page table entry. If an entry is not in the TLB, it's a TLB miss. Assuming *fully associative*:

| ,     | Tag = Virtual Page Number | Page Table Entry |                 |                      |  |
|-------|---------------------------|------------------|-----------------|----------------------|--|
| Valid |                           | Page Dirty       | Permission Bits | Physical Page Number |  |
|       |                           |                  |                 |                      |  |

## The Big Picture Revisited



## **Exercises:**



2) What should happen to the TLB when a new value is loaded into the page table addre

| ess register?<br>DTD | PTR  |
|----------------------|------|
|                      | [25] |
| PI                   | PZ   |

| TIB   | _   |                  |
|-------|-----|------------------|
| - VAN | PPN | valod            |
| 017   |     | <del>0/1</del> 0 |
| -     | _   | 040)             |
|       |     |                  |

3) A processor has 16-bit addresses, 256 byte pages, and an 8-entry fully associative TLB with LRU replacement (the LRU field is 3 bits and encodes the order in which pages were accessed, 0 being the most recent). At some time instant, the TLB for the current process is the initial state given in the table below. Assume that all current page table entries are in the initial TLB. Assume also that all pages can be read from and written to. Fill in the final state of the TLB according to the access pattern below.

Free physical pages: 0x17, 0x18, 0x19

UPN: total bits-page offset : 16-8=8

| 1-       | Read        | 0x11f0 |
|----------|-------------|--------|
| J →      | Write       | 0x1301 |
| ~~~      | Write       | 0x20ae |
| 7 7      | Write       | 0x2332 |
| 45       | Read        | 0x20ff |
| 5/2      | Write       | 0x3415 |
| <i>'</i> |             |        |
| 6        | Initial TLB |        |
|          |             |        |

Access pattern:

UPN: OXII VDN: 0x13 VPN: 0x20 VPN: 0x23 VPN: 0x20 hit

| c.a LD   |           |        |       |     |
|----------|-----------|--------|-------|-----|
| VPN      | PPN       | Valid  | Dirty | LRU |
| 0x01     | 0x11      | 1      | 1     | 0   |
| 0x00.0 4 | 3 0x00°07 | 7 2    | 20    | 7   |
| 0x10     | 0x13      | 1      | 1     | 1   |
| 0x20     | 0x12      | 1_     | 2     | 5   |
| 0×00001  | 23 0x00 b | 18 9 1 | 2     | 7   |
| 0x11     | 0x14      | 1      | 0     | 4   |
| 0xac     | 0x15      | 1      | 1     | 2   |

| VPN       | PPN       | Valid  | Dirty | LRU | Levi | LEU 2 | CRU3     | URU 4          |   |
|-----------|-----------|--------|-------|-----|------|-------|----------|----------------|---|
| 0x01      | 0x11      | 1      | 1     | 0   | Ţ    | 2     | 3        | 4              |   |
| 0x00,0 4  | 3 0x00°0  | 7 2    | 0     | 7   | 7    | 0     | l l      | 2              |   |
| 0x10      | 0x13      | 1      | 1     | 1   | 2    | 3     | 4        | 5              | ĺ |
| 0x20      | 0x12      | 1_     | 21    | 5   | 9    | 67    | 12       | 1              |   |
| 0x000V    | 23 0x00 b | 18 9 1 | 2     | 7   | 7    | +     | <b>(</b> | 0              | ١ |
| 0x11      | 0x14      | 1      | 0     | 4   | 0    | 1     | 2        | 3              |   |
| 0xac      | 0x15      | 1      | 1     | 2   | 3    | 4     | 5        | 67             |   |
| 9xff 1    | 0x16      | 1      | 8     | 3   | 4    | 5     | 16 1     | , <i>†</i>     | l |
| 0134      | 02/9      |        |       | •   | 1 1  | سے را | 1 011    | $\overline{f}$ | , |
| Cimel TID |           |        |       |     | 1 0  | 116   |          | 1-             |   |

| sme<br>\ | \\ \forall 7  |
|----------|---------------|
| \        | \ <u>&gt;</u> |

| LRU |
|-----|
| 5   |
| 3   |
| 6   |
| 1   |
| 2   |
| Ц   |
| チ   |
| 0   |
|     |

| LRUS           | LRU 6    |
|----------------|----------|
| И              | 5        |
| 4 2            | 3<br>6   |
| 50             | 6        |
| 6              | ລູ່      |
| 3              | <u>4</u> |
| 6              | 7        |
| <del>(7)</del> | O        |
|                |          |



The benefits of multi-threading programming come only after you understand concurrency. Here are two most common concurrency issues:

- Cache-incoherence: each hardware thread has its own cache, hence data modified in one thread may not be immediately reflected in the other. The can often be solved by bypassing cache and writing directly to memory, i.e. using volatile keyword in many languages.
- The famous **Read-modify-write**: Read-modify-write is a very common pattern in programming. In the context of multi-thread programming, the interleaving of R,M,W stages often produces a lot of issues.

To solve problem with Read-modify-write, we have to rely on the idea of **undisrupted execution**.

In RISC-V, we have two categories of atomic instructions:

- Load-reserve, store-conditional (undisrupted execution across multiple instructions)
- Amo.swap (single, undisrupted memory operation) and other amo operations.

Both can be used to achieve atomic primitives, here are two examples.

```
Test-and-set
                                                    Compare-and-swap
                                                    #expect old value in a1,
Start:addi t0 x0 1
                           #locked state is 1
      amoswap.w.aq t1 t0 (a0)
hne t1 x0 Start  #if the lock is not
                                                     #desired new value in a2
                                                    Start: lr a3 (a0)
                                                           bne a3 a1 Fail #CAS fail
                            #free, retry
                                                           sc a3 a2 (a0)
                                                           bnez a3 Start #retry if store fai
      ... #critical section
                                                            ... #critical section
       amoswap.w.rl x0 x0 a0
                                   #release lock
                                                           amoswap.w.rl x0 x0 a0
                                                    Fail: #failed CAS
Instruction definitions:
```

- Load-reserve: Loads the four bytes from memory at address x[rs1], writes them to x[rd], sign-extending the result, and registers a reservation on that memory word.
  - Store-conditional: Stores the four bytes in register x[rs2] to memory at address x[rs1], provided there exists a load reservation on that memory address. Writes 0 to x[rd] if the store succeeded, or a nonzero error code otherwise.
- **Amoswap**: Atomically, let t be the value of the memory word at address x[rs1], then set that memory word to x[rs2]. Set x[rd] to the sign extension of t.

Question: why do we need special instructions for these operations? Why can't we use normal load and store for 1r and sc? Why can't we expand amoswap to a normal load and store?