Making Page Tables Smaller

The Core Problem

With linear page tables, memory usage becomes huge.

Example:

32-bit address space → 2³² bytes
Page size = 4KB (2¹²)
Number of virtual pages = 2²⁰ ≈ 1 million
Each PTE = 4 bytes

👉 Page table size per process = 4 MB

If 100 processes are running:

4MB × 100 = 400 MB just for page tables

That’s too much memory overhead.

CRUX: How Do We Make Page Tables Smaller?

We need techniques to:

Reduce memory usage
Avoid storing unnecessary invalid entries
Keep performance reasonable

Solution 1: Bigger Pages

Idea:

Increase page size → fewer virtual pages → smaller page table.

Example:

If page size = 16KB instead of 4KB:

Number of PTEs decreases by 4×
Page table becomes 1MB instead of 4MB

✅ Advantage:

Smaller page table

❌ Problem: Internal Fragmentation

Large pages waste memory inside each page.

Example:

Program needs 1KB
Page size = 16KB
15KB wasted

Most systems prefer small pages (4KB or 8KB).

Multiple Page Sizes

Modern CPUs like:

MIPS
SPARC
x86-64

support multiple page sizes.

Why?

To reduce TLB pressure:

Large pages cover more memory
Fewer TLB entries needed
Common in databases

But:

Makes OS design more complex

2.Hybrid Approach: Segmentation + Paging

💡 Combine segmentation and paging.

Instead of:

One page table per process

Use:

One page table per segment (code, heap, stack)

This section explains a hybrid approach that combines:

📄 Paging (good flexibility)
📦 Segmentation (saves memory space)

The idea was first used in Multics, designed by Jack Dennis and others.

The Problem: Linear Page Tables Waste Memory

Example given:

16 KB virtual address space
1 KB page size
Total pages = 16

But only a few pages are actually used:

Segment	VPN Used	Physical Page
Code	0	10
Heap	4	23
Stack	14, 15	28, 4

👉 Most page table entries are invalid.

So we are allocating page table space just to mark entries as invalid.

Now imagine this waste in a 32-bit address space…

The Hybrid Idea

Instead of:

One large page table for the entire address space

Use:

One page table per segment (code, heap, stack)

How the Hybrid Works

We combine segmentation and paging.

Step 1: Divide Address Space into Segments

In a 32-bit address:


| Segment | VPN | Offset |

Top 2 bits decide segment:

Segment Bits	Meaning
00	Unused
01	Code
10	Heap
11	Stack

Step 2: Each Segment Has:

Base register → points to that segment’s page table
Bounds register → tells how many pages are valid

So each process now has:

3 separate page tables
- Code page table
- Heap page table
- Stack page table

Address Translation (On TLB Miss)

Hardware does:

Extract Segment Number (SN)
Extract VPN
Choose correct base register using SN
Compute PTE address:


AddressOfPTE = Base[SN] + (VPN × size_of_PTE)

This is similar to linear paging — just using different base registers.

Why This Saves Memory

Suppose:

Code uses 3 pages
Heap uses 2 pages
Stack uses 2 pages

Then:

Code page table → 3 entries
Heap page table → 2 entries
Stack page table → 2 entries

Instead of 16 entries total.

👉 We avoid allocating entries for unused gaps.

So no wasted page table space between heap and stack.

What Is the Bounds Register For?

The bounds register stores:

Maximum valid page number in that segment

Example:

Code uses pages 0,1,2
Bounds = 3

If program tries to access page 4 in code segment:

🚨 Exception occurs

This protects memory and prevents overflow.

✅ Advantages of Hybrid Approach

Reduces page table size
Avoids storing lots of invalid entries
Works well if segments are compact

❌ Problems With Hybrid Approach

1️⃣ Segmentation Limitations

Segmentation assumes:

Code, heap, stack are nicely separated

But if heap is large and sparse:

You still waste page table entries inside that segment

So waste is reduced, but not eliminated.

2️⃣ External Fragmentation Returns

Page tables are no longer page-sized.

They can be:

3 entries
7 entries
12 entries
etc.

So memory allocator must find variable-sized chunks.

This creates external fragmentation again.

Harder to manage than fixed-size pages.

🧩 Why People Moved On

Because:

Still wastes space for sparse segments
Brings back segmentation complexity
Causes external fragmentation

So better methods were developed:

👉 Multi-level page tables
👉 Inverted page tables

These remove unused entries more cleanly.

3.Multi-Level Page Tables

The Main Problem

A linear page table allocates space for every possible virtual page — even unused ones.

If most of the address space is empty:

👉 The page table wastes memory storing invalid entries.

The Core Idea of Multi-Level Page Tables

Instead of keeping one huge page table:

Break the page table into page-sized chunks
Only allocate chunks that contain valid entries
Use a new structure called a Page Directory to track them

This turns the page table into a small tree-like structure.

This approach is widely used in systems like x86.

Structure Overview (Two-Level Page Table)

Instead of:


Virtual Address → Page Table → Physical Frame

We now have:


Virtual Address
        ↓
Page Directory
        ↓
Page Table Page
        ↓
Physical Frame

Key Components

1️⃣ Page Directory

Contains Page Directory Entries (PDEs)
One PDE per page of the page table
Each PDE contains:
- Valid bit
- PFN (points to a page of page table)

Important:

If PDE is invalid → that entire region is unused
No page-table page is allocated for it

2️⃣ Page Table Pages

Each page-table page:

Contains normal PTEs
Each PTE maps VPN → PFN

Detailed Example

Given:

Address space = 16 KB
Page size = 64 bytes
Virtual address = 14 bits
- 8-bit VPN
- 6-bit offset

So:

Total virtual pages = 256
Each PTE = 4 bytes
Full linear page table = 256 × 4 = 1 KB

With 64-byte pages:

1 KB ÷ 64 = 16 page-table pages
Each page holds 16 PTEs

How Address Is Divided

Virtual address layout:


| VPN (8 bits) | Offset (6 bits) |

VPN is split into:


| Page Directory Index (4 bits) | Page Table Index (4 bits) |

Translation Steps (On TLB Miss)

Step 1: Extract Page Directory Index

Use top 4 bits of VPN.

Compute:


PDEAddr = PageDirBase + (PDIndex × sizeof(PDE))

If PDE is invalid:

Raise exception

If valid:

It gives PFN of page-table page

Step 2: Extract Page Table Index

Use next 4 bits of VPN.

Compute:


PTEAddr = (PDE.PFN << SHIFT) + (PTIndex × sizeof(PTE))

This gives the PTE.

Step 3: Form Physical Address


Physical Address = (PTE.PFN << SHIFT) + offset

Example
Virtual address:0x3F80
Binary: 11 1111 1000 0000
This refers to:
VPN = 254
Offset = 0

Step 1: Directory Index
Top 4 bits of VPN = 1111 = 15
→ Go to last PDE
→ It is valid
→ Points to page-table page at PFN 101

Step 2: Page Table Index
Next 4 bits = 1110 = 14
→ Look at 14th entry in that page-table page
→ It says VPN 254 → PFN 55
Step 3: Form Physical Address
PFN = 55
Offset = 0
So:
PhysAddr = (55 << 6) + 0
         = 0x0DC0
Translation complete ✅

Memory Savings

Linear page table needed:

16 page-table pages

Multi-level needed:

1 page directory page
2 page-table pages (only first and last regions used)

So only 3 pages instead of 16

Huge memory savings.

For 32-bit or 64-bit systems, savings are even larger.

Time–Space Trade-Off

Feature	Linear	Multi-Level
Memory Usage	High	Low
TLB Miss Cost	1 memory access	2 memory accesses
Complexity	Simple	More complex

Why Slower on TLB Miss?

Linear:

1 memory load → PTE

Multi-level:

1 load → PDE
1 load → PTE

So 2 loads instead of 1.

But on a TLB hit, performance is identical.

Downsides

More complex hardware/OS logic
Slower TLB misses
More bookkeeping

But usually worth it to save memory.

Summary

Multi-level page tables:

Remove unused portions of page table
Support sparse address spaces
Fit nicely into page-sized memory blocks
Avoid need for large contiguous memory

That’s why most modern systems use them.

Why Two Levels May Not Be Enough

In 32-bit systems:

Two levels usually sufficient.

In 64-bit systems:

Address space is huge
Even page directory becomes too large

So modern CPUs use:

3-level, 4-level, even 5-level page tables

Multi-Level Page Tables

🎯 Goal

Reduce page table memory overhead while keeping address translation efficient.

🔹 Why Multi-Level Page Tables?

A linear (single-level) page table can be very large:

Large virtual address space → many virtual pages
Many virtual pages → huge page table
Huge page table → wastes memory (especially for sparse address spaces)

Multi-level page tables solve this by:

Breaking the page table into smaller pieces
Allocating second-level tables only when needed

Basic Idea (Two-Level Example)

Instead of:


Virtual Address → Page Table → Physical Frame

We use:


Virtual Address → Page Directory → Page Table → Physical Frame

The virtual address is split into:

Field	Purpose
Page Directory Index	Selects a page table
Page Table Index	Selects a PTE inside that table
Offset	Selects byte within page

🔎 Example Breakdown

Given:

30-bit virtual address
512-byte page size
4-byte PTE

Step 1: Determine PTEs per page

512 bytes / 4 bytes = 128 PTEs

So:

7 bits needed for page-table index (because log₂(128) = 7)

Step 2: Remaining bits

VPN = 21 bits
7 bits used for PT index
Remaining 14 bits used for Page Directory index

Problem:

14-bit directory → 2¹⁴ entries = 16,384 entries
Too large to fit in one page

🔺 Solution: Add Another Level

Split the page directory into multiple pages and add another directory above it.

Now virtual address becomes:

Field	Meaning
PD Index 0	Top-level directory
PD Index 1	Second-level directory
Page Table Index	Index into page table
Offset	Byte offset

Now:

Each directory fits in one page
Goal achieved: Every structure fits within a single page

🔁 Translation Process

On every memory reference:

1️⃣ TLB Check First

If TLB hit → translation done immediately
If TLB miss → full multi-level lookup required

2️⃣ Page Table Walk (On TLB Miss)

Use top-level directory index
Fetch second-level directory
Fetch page table entry (PTE)
Check valid bits and protection
Insert translation into TLB
Retry instruction

Cost:

Each level adds one extra memory access
Two-level table → 2 extra memory accesses on TLB miss
More levels → more accesses

Multi-Level Page Table Control Flow

Phase 1: Extract the VPN


VPN = (VirtualAddress & VPN_MASK) >> SHIFT

Extract the Virtual Page Number (VPN) from the virtual address.
The lower bits are the offset.
The upper bits form the VPN.

Phase 2: Check the TLB First


(Success, TlbEntry) = TLB_Lookup(VPN)

The hardware first checks the TLB (Translation Lookaside Buffer).

Case 1: TLB Hit


if (Success == True)

Step 1: Check Protection


if (CanAccess(TlbEntry.ProtectBits) == True)

Verify read/write/execute permissions.

Step 2: Form Physical Address


Offset = VirtualAddress & OFFSET_MASK
PhysAddr = (TlbEntry.PFN << SHIFT) | Offset

Take Physical Frame Number (PFN)
Shift it to correct position
Add offset

Step 3: Access Memory


Register = AccessMemory(PhysAddr)

✔ Done — no page table access needed

❌ If Protection Fails


RaiseException(PROTECTION_FAULT)

The process attempted illegal access.

Case 2: TLB Miss

If the VPN is not in the TLB:


else // TLB Miss

Now the hardware must walk the multi-level page table.

Step 1: Get Page Directory Entry (PDE)


PDIndex = (VPN & PD_MASK) >> PD_SHIFT
PDEAddr = PDBR + (PDIndex * sizeof(PDE))
PDE = AccessMemory(PDEAddr)

Extract Page Directory index
Use PDBR (Page Directory Base Register)
Fetch the PDE from memory

❌ If PDE Invalid


RaiseException(SEGMENTATION_FAULT)

Means:

The page table for this region doesn’t exist
Page fault or illegal access

Step 2: Get Page Table Entry (PTE)

If PDE is valid:


PTIndex = (VPN & PT_MASK) >> PT_SHIFT
PTEAddr = (PDE.PFN << SHIFT) + (PTIndex * sizeof(PTE))
PTE = AccessMemory(PTEAddr)

Extract Page Table index
Use PFN from PDE
Fetch PTE

❌ If PTE Invalid


RaiseException(SEGMENTATION_FAULT)

Page is not present in memory.

❌ If Protection Fails


RaiseException(PROTECTION_FAULT)

Permissions violation.

Step 3: Update TLB

If everything is valid:


TLB_Insert(VPN, PTE.PFN, PTE.ProtectBits)

Insert translation into TLB
So future accesses are faster

Step 4: Retry Instruction


RetryInstruction()

The CPU restarts the instruction
This time it will hit in the TLB

Summary Flow


Memory Access
     ↓
Check TLB
     ↓
Hit? → Yes → Form Physical Address → Done
     ↓
No
     ↓
Read PDE
     ↓
Valid?
     ↓
Read PTE
     ↓
Valid + Allowed?
     ↓
Insert into TLB
     ↓
Retry Instruction

Trade-Offs

Advantage	Disadvantage
Saves memory	Slower on TLB miss
Works well for sparse address spaces	More complex
Allocates tables only when needed	Multiple memory lookups

4.Inverted Page Tables

Goal

Reduce memory usage even further.

🔹 Basic Idea

Instead of:

One page table per process
Entry per virtual page

We use:

One global page table
Entry per physical frame

🔁 Structure

Each entry contains:

Process ID (PID)
Virtual page number (VPN)
Other control bits

So:


Physical Frame → (PID, VPN)

How Lookup Works

To translate:

Given (PID, VPN)
Search inverted page table
Find matching entry
Get physical frame

Problem

Searching entire table linearly is slow.

Solution:

Use a hash table to speed lookup.

Some architectures like the PowerPC use this approach.

Trade-Offs

Advantage	Disadvantage
Huge memory savings	Lookup is complex
Only one table for entire system	Needs hashing
Scales well with large virtual spaces	More overhead per translation

Multi-Level vs Inverted Page Tables

Feature	Multi-Level	Inverted
Tables per process	Yes	No (single global table)
Entries per	Virtual page	Physical frame
Memory usage	Medium	Very low
Lookup speed	Faster (structured walk)	Slower without hashing
Best for	Sparse address spaces	Systems with large memory

Final Understanding

Multi-level page tables reduce memory by allocating page tables only when needed.
Inverted page tables reduce memory further by indexing physical frames instead of virtual pages.
Both represent different trade-offs between:
- Space
- Lookup time
- Complexity

At the end of the day, page tables are just data structures, and their design depends on system constraints.

5.Swapping Page Tables to Disk

🔹 The Basic Assumption (Before)

Normally, we assume:

Page tables are stored in physical memory
They are always resident in RAM
The kernel directly accesses them during address translation

But this assumption breaks in large systems.

The Problem

Even with:

Multi-level paging
Inverted page tables
Larger page sizes

Page tables can still become very large, especially in:

64-bit systems
Systems with many processes
Systems with huge virtual address spaces

If memory becomes tight, page tables themselves may consume too much RAM.

The Solution: Put Page Tables in Virtual Memory

Instead of keeping page tables permanently in physical memory:

👉 The system places page tables in kernel virtual memory.

This means:

Page tables are treated like normal pages
They can be swapped out to disk
They can be brought back into memory when needed

🔁 How Swapping Page Tables Works

Step 1: Memory Pressure Occurs

RAM is nearly full.

Step 2: Kernel Selects Some Page Table Pages

Infrequently used page table pages are chosen.
They are written to disk (swap space).

Step 3: Page Table Entry Becomes “Not Present”

If hardware tries to access a swapped-out page table page:

A page fault occurs.
OS loads the page table page back from disk.
Translation continues.

What This Means During Translation

If a TLB miss occurs and:

The needed page directory or page table page is swapped out

Then:

Hardware triggers a page fault.
OS loads that page table page from disk.
The page table walk resumes.

So now even page table access can cause page faults.

⚠️ Important Implication

Normally:

TLB miss → 2 memory accesses (for 2-level table)

With swapped page tables:

TLB miss → possible disk I/O

This makes translation much slower in worst-case scenarios.

📊 Advantages

Reduces RAM usage
Allows extremely large address spaces
Helps under heavy memory pressure

❌ Disadvantages

Slower page faults
More complex kernel design
Page table access may trigger additional page faults
Can increase latency significantly

Why This Is Rarely Noticeable

In practice:

Frequently used page table pages stay in memory.
Only rarely used parts get swapped out.
Good locality helps reduce overhead.

Summary

Swapping page tables means:

Page tables themselves are pageable memory.

Instead of being permanently stored in RAM, they:

Live in kernel virtual memory
Can be swapped to disk
Are loaded on demand

This increases flexibility and supports very large systems, but adds complexity and potential performance cost.