Making Page Tables Smaller
Making Page Tables Smaller
The Core Problem
With linear page tables, memory usage becomes huge.
Example:
-
32-bit address space → 2³² bytes
-
Page size = 4KB (2¹²)
-
Number of virtual pages = 2²⁰ ≈ 1 million
-
Each PTE = 4 bytes
π Page table size per process = 4 MB
If 100 processes are running:
-
4MB × 100 = 400 MB just for page tables
That’s too much memory overhead.
CRUX: How Do We Make Page Tables Smaller?
We need techniques to:
-
Reduce memory usage
-
Avoid storing unnecessary invalid entries
-
Keep performance reasonable
Solution 1: Bigger Pages
Idea:
Increase page size → fewer virtual pages → smaller page table.
Example:
If page size = 16KB instead of 4KB:
-
Number of PTEs decreases by 4×
-
Page table becomes 1MB instead of 4MB
✅ Advantage:
-
Smaller page table
❌ Problem: Internal Fragmentation
Large pages waste memory inside each page.
Example:
-
Program needs 1KB
-
Page size = 16KB
-
15KB wasted
Most systems prefer small pages (4KB or 8KB).
Multiple Page Sizes
Modern CPUs like:
-
MIPS
-
SPARC
-
x86-64
support multiple page sizes.
Why?
To reduce TLB pressure:
-
Large pages cover more memory
-
Fewer TLB entries needed
-
Common in databases
But:
-
Makes OS design more complex
2.Hybrid Approach: Segmentation + Paging
π‘ Combine segmentation and paging.
Instead of:
-
One page table per process
Use:
-
One page table per segment (code, heap, stack)
-
π Paging (good flexibility)
-
π¦ Segmentation (saves memory space)
The idea was first used in Multics, designed by Jack Dennis and others.
The Problem: Linear Page Tables Waste Memory
Example given:
-
16 KB virtual address space
-
1 KB page size
-
Total pages = 16
But only a few pages are actually used:
| Segment | VPN Used | Physical Page |
|---|---|---|
| Code | 0 | 10 |
| Heap | 4 | 23 |
| Stack | 14, 15 | 28, 4 |
π Most page table entries are invalid.
So we are allocating page table space just to mark entries as invalid.
Now imagine this waste in a 32-bit address space…
The Hybrid Idea
Instead of:
One large page table for the entire address space
Use:
One page table per segment (code, heap, stack)
How the Hybrid Works
We combine segmentation and paging.
Step 1: Divide Address Space into Segments
In a 32-bit address:
Top 2 bits decide segment:
| Segment Bits | Meaning |
|---|---|
| 00 | Unused |
| 01 | Code |
| 10 | Heap |
| 11 | Stack |
Step 2: Each Segment Has:
-
Base register → points to that segment’s page table
-
Bounds register → tells how many pages are valid
So each process now has:
-
3 separate page tables
-
Code page table
-
Heap page table
-
Stack page table
-
Address Translation (On TLB Miss)
Hardware does:
-
Extract Segment Number (SN)
-
Extract VPN
-
Choose correct base register using SN
-
Compute PTE address:
This is similar to linear paging — just using different base registers.
Why This Saves Memory
Suppose:
-
Code uses 3 pages
-
Heap uses 2 pages
-
Stack uses 2 pages
Then:
-
Code page table → 3 entries
-
Heap page table → 2 entries
-
Stack page table → 2 entries
Instead of 16 entries total.
π We avoid allocating entries for unused gaps.
So no wasted page table space between heap and stack.
What Is the Bounds Register For?
The bounds register stores:
Maximum valid page number in that segment
Example:
-
Code uses pages 0,1,2
-
Bounds = 3
If program tries to access page 4 in code segment:
π¨ Exception occurs
This protects memory and prevents overflow.
✅ Advantages of Hybrid Approach
-
Reduces page table size
-
Avoids storing lots of invalid entries
-
Works well if segments are compact
❌ Problems With Hybrid Approach
1️⃣ Segmentation Limitations
Segmentation assumes:
-
Code, heap, stack are nicely separated
But if heap is large and sparse:
-
You still waste page table entries inside that segment
So waste is reduced, but not eliminated.
2️⃣ External Fragmentation Returns
Page tables are no longer page-sized.
They can be:
-
3 entries
-
7 entries
-
12 entries
-
etc.
So memory allocator must find variable-sized chunks.
This creates external fragmentation again.
Harder to manage than fixed-size pages.
π§© Why People Moved On
Because:
-
Still wastes space for sparse segments
-
Brings back segmentation complexity
-
Causes external fragmentation
So better methods were developed:
π Multi-level page tables
π Inverted page tables
These remove unused entries more cleanly.
3.Multi-Level Page Tables
The Main Problem
A linear page table allocates space for every possible virtual page — even unused ones.
If most of the address space is empty:
π The page table wastes memory storing invalid entries.
The Core Idea of Multi-Level Page Tables
Instead of keeping one huge page table:
-
Break the page table into page-sized chunks
-
Only allocate chunks that contain valid entries
-
Use a new structure called a Page Directory to track them
This turns the page table into a small tree-like structure.
This approach is widely used in systems like x86.
Structure Overview (Two-Level Page Table)
Instead of:
We now have:
Key Components
1️⃣ Page Directory
-
Contains Page Directory Entries (PDEs)
-
One PDE per page of the page table
-
Each PDE contains:
-
Valid bit
-
PFN (points to a page of page table)
-
Important:
-
If PDE is invalid → that entire region is unused
-
No page-table page is allocated for it
2️⃣ Page Table Pages
Each page-table page:
-
Contains normal PTEs
-
Each PTE maps VPN → PFN
Detailed Example
Given:
-
Address space = 16 KB
-
Page size = 64 bytes
-
Virtual address = 14 bits
-
8-bit VPN
-
6-bit offset
-
So:
-
Total virtual pages = 256
-
Each PTE = 4 bytes
-
Full linear page table = 256 × 4 = 1 KB
With 64-byte pages:
-
1 KB ÷ 64 = 16 page-table pages
-
Each page holds 16 PTEs
How Address Is Divided
Virtual address layout:
VPN is split into:
Translation Steps (On TLB Miss)
Step 1: Extract Page Directory Index
Use top 4 bits of VPN.
Compute:
If PDE is invalid:
-
Raise exception
If valid:
-
It gives PFN of page-table page
Step 2: Extract Page Table Index
Use next 4 bits of VPN.
Compute:
This gives the PTE.
Step 3: Form Physical Address
Memory Savings
Linear page table needed:
-
16 page-table pages
Multi-level needed:
-
1 page directory page
-
2 page-table pages (only first and last regions used)
So only 3 pages instead of 16
Huge memory savings.
For 32-bit or 64-bit systems, savings are even larger.
Time–Space Trade-Off
| Feature | Linear | Multi-Level |
|---|---|---|
| Memory Usage | High | Low |
| TLB Miss Cost | 1 memory access | 2 memory accesses |
| Complexity | Simple | More complex |
Why Slower on TLB Miss?
Linear:
-
1 memory load → PTE
Multi-level:
-
1 load → PDE
-
1 load → PTE
So 2 loads instead of 1.
But on a TLB hit, performance is identical.
Downsides
-
More complex hardware/OS logic
-
Slower TLB misses
-
More bookkeeping
But usually worth it to save memory.
Summary
Multi-level page tables:
-
Remove unused portions of page table
-
Support sparse address spaces
-
Fit nicely into page-sized memory blocks
-
Avoid need for large contiguous memory
That’s why most modern systems use them.
Why Two Levels May Not Be Enough
In 32-bit systems:
-
Two levels usually sufficient.
In 64-bit systems:
-
Address space is huge
-
Even page directory becomes too large
So modern CPUs use:
-
3-level, 4-level, even 5-level page tables
Multi-Level Page Tables
π― Goal
Reduce page table memory overhead while keeping address translation efficient.
πΉ Why Multi-Level Page Tables?
A linear (single-level) page table can be very large:
-
Large virtual address space → many virtual pages
-
Many virtual pages → huge page table
-
Huge page table → wastes memory (especially for sparse address spaces)
Multi-level page tables solve this by:
-
Breaking the page table into smaller pieces
-
Allocating second-level tables only when needed
Basic Idea (Two-Level Example)
Instead of:
We use:
The virtual address is split into:
| Field | Purpose |
|---|---|
| Page Directory Index | Selects a page table |
| Page Table Index | Selects a PTE inside that table |
| Offset | Selects byte within page |
π Example Breakdown
Given:
-
30-bit virtual address
-
512-byte page size
-
4-byte PTE
Step 1: Determine PTEs per page
512 bytes / 4 bytes = 128 PTEs
So:
-
7 bits needed for page-table index (because log₂(128) = 7)
Step 2: Remaining bits
-
VPN = 21 bits
-
7 bits used for PT index
-
Remaining 14 bits used for Page Directory index
Problem:
-
14-bit directory → 2¹⁴ entries = 16,384 entries
-
Too large to fit in one page
πΊ Solution: Add Another Level
Split the page directory into multiple pages and add another directory above it.
Now virtual address becomes:
| Field | Meaning |
|---|---|
| PD Index 0 | Top-level directory |
| PD Index 1 | Second-level directory |
| Page Table Index | Index into page table |
| Offset | Byte offset |
Now:
-
Each directory fits in one page
-
Goal achieved: Every structure fits within a single page
π Translation Process
On every memory reference:
1️⃣ TLB Check First
-
If TLB hit → translation done immediately
-
If TLB miss → full multi-level lookup required
2️⃣ Page Table Walk (On TLB Miss)
-
Use top-level directory index
-
Fetch second-level directory
-
Fetch page table entry (PTE)
-
Check valid bits and protection
-
Insert translation into TLB
-
Retry instruction
Cost:
-
Each level adds one extra memory access
-
Two-level table → 2 extra memory accesses on TLB miss
-
More levels → more accesses
Multi-Level Page Table Control Flow
Phase 1: Extract the VPN
-
Extract the Virtual Page Number (VPN) from the virtual address.
-
The lower bits are the offset.
-
The upper bits form the VPN.
Phase 2: Check the TLB First
The hardware first checks the TLB (Translation Lookaside Buffer).
Case 1: TLB Hit
Step 1: Check Protection
-
Verify read/write/execute permissions.
Step 2: Form Physical Address
-
Take Physical Frame Number (PFN)
-
Shift it to correct position
-
Add offset
Step 3: Access Memory
✔ Done — no page table access needed
❌ If Protection Fails
The process attempted illegal access.
Case 2: TLB Miss
If the VPN is not in the TLB:
Now the hardware must walk the multi-level page table.
Step 1: Get Page Directory Entry (PDE)
-
Extract Page Directory index
-
Use PDBR (Page Directory Base Register)
-
Fetch the PDE from memory
❌ If PDE Invalid
Means:
-
The page table for this region doesn’t exist
-
Page fault or illegal access
Step 2: Get Page Table Entry (PTE)
If PDE is valid:
-
Extract Page Table index
-
Use PFN from PDE
-
Fetch PTE
❌ If PTE Invalid
Page is not present in memory.
❌ If Protection Fails
Permissions violation.
Step 3: Update TLB
If everything is valid:
-
Insert translation into TLB
-
So future accesses are faster
Step 4: Retry Instruction
-
The CPU restarts the instruction
-
This time it will hit in the TLB
Summary Flow
Trade-Offs
| Advantage | Disadvantage |
|---|---|
| Saves memory | Slower on TLB miss |
| Works well for sparse address spaces | More complex |
| Allocates tables only when needed | Multiple memory lookups |
4.Inverted Page Tables
Goal
Reduce memory usage even further.
πΉ Basic Idea
Instead of:
-
One page table per process
-
Entry per virtual page
We use:
-
One global page table
-
Entry per physical frame
π Structure
Each entry contains:
-
Process ID (PID)
-
Virtual page number (VPN)
-
Other control bits
So:
How Lookup Works
To translate:
-
Given (PID, VPN)
-
Search inverted page table
-
Find matching entry
-
Get physical frame
Problem
Searching entire table linearly is slow.
Solution:
Use a hash table to speed lookup.
Some architectures like the PowerPC use this approach.
Trade-Offs
| Advantage | Disadvantage |
|---|---|
| Huge memory savings | Lookup is complex |
| Only one table for entire system | Needs hashing |
| Scales well with large virtual spaces | More overhead per translation |
Multi-Level vs Inverted Page Tables
| Feature | Multi-Level | Inverted |
|---|---|---|
| Tables per process | Yes | No (single global table) |
| Entries per | Virtual page | Physical frame |
| Memory usage | Medium | Very low |
| Lookup speed | Faster (structured walk) | Slower without hashing |
| Best for | Sparse address spaces | Systems with large memory |
Final Understanding
-
Multi-level page tables reduce memory by allocating page tables only when needed.
-
Inverted page tables reduce memory further by indexing physical frames instead of virtual pages.
-
Both represent different trade-offs between:
-
Space
-
Lookup time
-
Complexity
-
At the end of the day, page tables are just data structures, and their design depends on system constraints.
5.Swapping Page Tables to Disk
πΉ The Basic Assumption (Before)
Normally, we assume:
-
Page tables are stored in physical memory
-
They are always resident in RAM
-
The kernel directly accesses them during address translation
But this assumption breaks in large systems.
The Problem
Even with:
-
Multi-level paging
-
Inverted page tables
-
Larger page sizes
Page tables can still become very large, especially in:
-
64-bit systems
-
Systems with many processes
-
Systems with huge virtual address spaces
If memory becomes tight, page tables themselves may consume too much RAM.
The Solution: Put Page Tables in Virtual Memory
Instead of keeping page tables permanently in physical memory:
π The system places page tables in kernel virtual memory.
This means:
-
Page tables are treated like normal pages
-
They can be swapped out to disk
-
They can be brought back into memory when needed
π How Swapping Page Tables Works
Step 1: Memory Pressure Occurs
RAM is nearly full.
Step 2: Kernel Selects Some Page Table Pages
-
Infrequently used page table pages are chosen.
-
They are written to disk (swap space).
Step 3: Page Table Entry Becomes “Not Present”
If hardware tries to access a swapped-out page table page:
-
A page fault occurs.
-
OS loads the page table page back from disk.
-
Translation continues.
What This Means During Translation
If a TLB miss occurs and:
-
The needed page directory or page table page is swapped out
Then:
-
Hardware triggers a page fault.
-
OS loads that page table page from disk.
-
The page table walk resumes.
So now even page table access can cause page faults.
⚠️ Important Implication
Normally:
-
TLB miss → 2 memory accesses (for 2-level table)
With swapped page tables:
-
TLB miss → possible disk I/O
This makes translation much slower in worst-case scenarios.
π Advantages
-
Reduces RAM usage
-
Allows extremely large address spaces
-
Helps under heavy memory pressure
❌ Disadvantages
-
Slower page faults
-
More complex kernel design
-
Page table access may trigger additional page faults
-
Can increase latency significantly
Why This Is Rarely Noticeable
In practice:
-
Frequently used page table pages stay in memory.
-
Only rarely used parts get swapped out.
-
Good locality helps reduce overhead.
Summary
Swapping page tables means:
Page tables themselves are pageable memory.
Instead of being permanently stored in RAM, they:
-
Live in kernel virtual memory
-
Can be swapped to disk
-
Are loaded on demand
This increases flexibility and supports very large systems, but adds complexity and potential performance cost.








Comments
Post a Comment