Write the physical page allocator. It keeps track of which pages are free with a linked list of struct PageInfo objects (which, unlike xv6, are not embedded in the free pages themselves), each corresponding to a physical page.
Exercise 1
Implement the following functions.
1 2 3 4 5
boot_alloc() mem_init() (only up to the call to check_page_free_list(1)) page_init() page_alloc() page_free()
boot_alloc() is a simple physical allocator which is used only while kernel is setting up its virtual memory system.
staticvoid * boot_alloc(uint32_t n) { staticchar *nextfree; // virtual address of next byte of free memory char *result;
// Initialize nextfree if this is the first time. // 'end' is a magic symbol automatically generated by the linker, // which points to the end of the kernel's bss segment: // the first virtual address that the linker did *not* assign // to any kernel code or global variables. if (!nextfree) { externchar end[]; nextfree = ROUNDUP((char *) end, PGSIZE); }
// Allocate a chunk large enough to hold 'n' bytes, then update // nextfree. Make sure nextfree is kept aligned // to a multiple of PGSIZE. // if ((uint32_t)nextfree - KERNBASE > (npages * PGSIZE - n)) { panic("out of memory"); } result = nextfree; nextfree = ROUNDUP(nextfree + n, PGSIZE); return result; }
In mem_init(), add code to allocate an array of npages struct PageInfos and store it in pages. The kernel uses this array to keep track of physical pages.
// the rest of base memory for (i = 1; i < npages_basemem; i++) { pages[i].pp_ref = 0; pages[i].pp_link = page_free_list; page_free_list = &pages[i]; }
// IO hole for (i = IOPHYSMEM >> 12; i < EXTPHYSMEM >> 12; i++) { pages[i].pp_ref = 1; pages[i].pp_link = NULL; }
// in use extended memory, including kernel, kernpgdir and pages array for (i = EXTPHYSMEM >> 12; i < end_of_inuse_page; i++) { pages[i].pp_ref = 1; pages[i].pp_link = NULL; }
// free extended memory for (i = end_of_inuse_page; i < npages; i++) { pages[i].pp_ref = 0; pages[i].pp_link = page_free_list; page_free_list = &pages[i]; } }
page_alloc() is used to allocate a physical page from the free list.
if (alloc_flags & ALLOC_ZERO) memset(page2kva(victim), 0, PGSIZE);
return victim; }
page_free() is used to free a physical page and put it into the free list. We need to check whether the page is still in use (using pp_ref) or the page has already been freed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
void page_free(struct PageInfo *pp) { // Fill this function in // Hint: You may want to panic if pp->pp_ref is nonzero or // pp->pp_link is not NULL. if (pp->pp_ref) panic("reference count non zero");
In x86 terminology, a virtual address consists of a segment selector and an offset within the segment. A linear address is what you get after segment translation but before page translation. A physical address is what you finally get after both segment and page translation and what ultimately goes out on the hardware bus to your RAM.
In out inplementation, the GDT installed in boot/boot.s actually disabled segment translation by setting all segment base addresses to 0 and limits to 0xffffffff. Hence the “selector” has no effect and the linear address always equals the offset of the virtual address.
In the previous lab, we installed a simple page table to map only 4MB of physical memory at 0x00100000 to the virtual space starting at 0xf0100000. This is sufficient for the kernel to start up, but we will need to set up the complete kernel space in the following part.
Question 1
C type
Address type
T*
Virtual
uintptr_t
Virtual
physaddr_t
Physical
Assuming that the following JOS kernel code is correct, what type should variable x have, uintptr_t or physaddr_t?
1 2 3 4
mystery_t x; char* value = return_a_pointer(); *value = 10; x = (mystery_t) value;
The answer is uintptr_t.
Reference Counting
We keep a count of the number of references to each physical page in the pp_ref field of the struct PageInfo corresponding to the physical page.
When this count goes to zero for a physical page, that page can be freed because it is no longer used. In general, this count should be equal to the number of times the physical page appears below UTOP in all page tables (the mappings above UTOP are mostly set up at boot time by the kernel and should never be freed, so there’s no need to reference count them).
We’ll also use it to keep track of the number of pointers we keep to the page directory pages and, in turn, of the number of references the page directories have to page table pages.
pgdir_walk() tries to find the relevant page table entry in the page directory for a given virtual address. If create equals to 1, it will create the corresponding page table if it does not exist. The function returns a pointer to the found page table entry.
boot_map_region() maps [va, va+size) of virtual address space to physical [pa, pa+size) in the page table rooted at pgdir. It’s intended to set up the static mappings above UTOP, so pp_ref field of the mapped pages isn’t changed.
1 2 3 4 5 6 7 8 9 10 11 12 13
staticvoid boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm) { // mind the alignment int cnt = size / PGSIZE; if (size % PGSIZE) cnt++; for (int i = 0; i < cnt; ++i) { pte_t *pte_ptr = pgdir_walk(pgdir, (constvoid *)va, 1); (*pte_ptr) = pa | perm | PTE_P; va += PGSIZE; pa += PGSIZE; } }
page_lookup() returns the page mapped at virtual address ‘va’. If pte_store is not zero, then we store in it the address of the pte for this page.
if (pte_ptr != NULL) { if ((*pte_ptr) != 0) { (*pte_ptr) = 0; tlb_invalidate(pgdir, va); } page_decref(page); } }
page_insert() maps the physical page pp at virtual address va. If there is already a page mapped at va, it will be removed using page_remove(). And pp_ref should be incremented it the insertion succeeds. There is a corner case which requires attention. If the same pp is re-inserted at the same virtual address in the same pgdir, if we call page_remove() before incrementing the pp_ref, the page may be freed because of the call to page_decref() in page_remove(). By incrementing pp_ref before calling page_remove(), we can elegantly handle the corner case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
int page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm) { pte_t *pte_ptr = pgdir_walk(pgdir, va, 1);
if (pte_ptr == NULL) return -E_NO_MEM;
// increment pp_ref before page_remove() to handle corner case pp->pp_ref++;
S-0 R-0 S-0 R-0 S x S-0 R-0 S-0 W-1 S x S-0 R-0 U-1 R-0 S x S-0 R-0 U-1 W-1 S x S-0 W-1 S-0 R-0 S x S-0 W-1 S-0 W-1 S x S-0 W-1 U-1 R-0 S x S-0 W-1 U-1 W-1 S x U-1 R-0 S-0 R-0 S x U-1 R-0 S-0 W-1 S x U-1 R-0 U-1 R-0 U R U-1 R-0 U-1 W-1 U R U-1 W-1 S-0 R-0 S x U-1 W-1 S-0 W-1 S x U-1 W-1 U-1 R-0 U R U-1 W-1 U-1 W-1 U W
Initializing the Kernel Address Space
Now we use boot_map_region which we just implemented to initialize the kernel address space above UTOP.
Exercise 5
Fill in the missing code in mem_init() after the call to check_page().
////////////////////////////////////////////////////////////////////// // Map 'pages' read-only by the user at linear address UPAGES // Permissions: // - the new image at UPAGES -- kernel R, user R // (ie. perm = PTE_U | PTE_P) // - pages itself -- kernel RW, user NONE boot_map_region(kern_pgdir, UPAGES, PTSIZE, PADDR(pages), PTE_U);
////////////////////////////////////////////////////////////////////// // Use the physical memory that 'bootstack' refers to as the kernel // stack. The kernel stack grows down from virtual address KSTACKTOP. // We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP) // to be the kernel stack, but break this into two pieces: // * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory // * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if // the kernel overflows its stack, it will fault rather than // overwrite memory. Known as a "guard page". // Permissions: kernel RW, user NONE boot_map_region(kern_pgdir, KSTACKTOP - KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);
////////////////////////////////////////////////////////////////////// // Map all of physical memory at KERNBASE. // Ie. the VA range [KERNBASE, 2^32) should map to // the PA range [0, 2^32 - KERNBASE) // We might not have 2^32 - KERNBASE bytes of physical memory, but // we just set up the mapping anyway. // Permissions: kernel RW, user NONE boot_map_region(kern_pgdir, KERNBASE, 0x0fffffff, 0, PTE_W);
Question 2
What entries (rows) in the page directory have been filled in at this point?
Entry
Base Virtual Address
Points to (logically)
1023
0xffc00000
Page table for top 4MB of phys memory
1022
0xff800000
…
…
…
960
0xf0000000
Page table for bottom 4MB of phys memory
959
0xefff8000
Page table for kernel stack
957
0xef400000
PD itself as a page table
956
0xef000000
Page table of pages mapped at UPAGES
…
…
…
2
?
?
1
?
?
0
?
?
We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel’s memory? What specific mechanisms protect the kernel memory?
Page level protection. The U/S bit in PTEs.
What is the maximum amount of physical memory that this operating system can support? Why?
2GB. npages = 524288, max amount of physical memory = npages * 4KB = 2GB.
How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down?
Size of pages = npages * sizeof(struct PageInfo);
Size of kern_pgdir = PGSIZE
Total size of all page tables = PGSIZE * 1K
Revisit the page table setup in kern/entry.S and kern/entrypgdir.c. Immediately after we turn on paging, EIP is still a low number (a little over 1MB). At what point do we transition to running at an EIP above KERNBASE? What makes it possible for us to continue executing at a low EIP between when we enable paging and when we begin running at an EIP above KERNBASE? Why is this transition necessary?
jmp *%eax does the trasition. Because we set up a entry_pgdir to map virtual addresses [KERNBASE, KERNBASE+4MB) to physical addresses [0, 4MB). Because we need kernel to run at the virtual space above KERNBASE, leaving the virtual space below for user process.
Challenge
Extend the JOS kernel monitor with commands showmappings, setpermbits and dumpmem.
int mon_showmappings(int argc, char **argv, struct Trapframe *tf) { if (argc != 3 && argc != 2) { cprintf("Wrong number of arguments.\n"); cprintf("Usage: showmappings begin_vaddr [end_vaddr]\n"); return1; }
int begin, end; structPageInfo *pp; pte_t *pptr; char buf[15];
begin = ROUNDDOWN((uint32_t)strtol(argv[1], NULL, 16), PGSIZE); if (argc == 2) end = begin; else end = ROUNDDOWN((uint32_t)strtol(argv[2], NULL, 16), PGSIZE);
if (begin > 0xf7ffffff || end > 0xf7ffffff) { cprintf("Virtual Address out of range.\n"); return1; }
cprintf("%8s\t%8s\t%12s\n", "VA", "PA", "PERMBITS"); for (; begin <= end; begin += PGSIZE) { pp = page_lookup(kern_pgdir, (void *)begin, &pptr); if (pp != NULL) { pageperm2str(*pptr, buf); cprintf("%08x\t%08x\t%12s\n", begin, page2pa(pp), buf); } else cprintf("%08x\t%8s\t%12s\n", begin, "none", "none", "none"); } return0; }
intmon_dumpmem(int argc, char **argv, struct Trapframe *tf){ if (argc != 4) { cprintf("Wrong number of arguments.\n"); cprintf("Usage: dumpmem -p/v begin_addr end_addr\n"); return1; }
int begin, end;
begin = (uint32_t) strtol(argv[2], NULL, 0); end = (uint32_t) strtol(argv[3], NULL, 0);
if (argv[1][1] == 'p') { if ((PGNUM(begin) >= npages) || (PGNUM(end) >= npages)) { cprintf("Physical address out of range.\n"); return1; } begin = (uint32_t) KADDR(begin); end = (uint32_t) KADDR(end); }
if (begin >= 0xf7ffffff || end >= 0xf7ffffff) { cprintf("Virtual address out of range.\n"); return1; }
cprintf("Virtual\t\tPhysical\tMemory Contents\n"); while (begin <= end) { int i; pte_t *pptr;
cprintf("%08x\t", begin); if (page_lookup(kern_pgdir, (void *) begin, &pptr) == NULL || *pptr == 0) { cprintf("No Mapping\n"); begin += PGSIZE - begin % PGSIZE; continue; }