发布于 

MIT 6.828 Lab2

Notes of MIT 6.828 Lab2

Part 1: Physical Page Management

physical page allocator

Write the physical page allocator. It keeps track of which pages are free with a linked list of struct PageInfo objects (which, unlike xv6, are not embedded in the free pages themselves), each corresponding to a physical page.

Exercise 1

Implement the following functions.

1
2
3
4
5
boot_alloc()
mem_init() (only up to the call to check_page_free_list(1))
page_init()
page_alloc()
page_free()

boot_alloc() is a simple physical allocator which is used only while kernel is setting up its virtual memory system.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
static void *
boot_alloc(uint32_t n)
{
static char *nextfree; // virtual address of next byte of free memory
char *result;

// Initialize nextfree if this is the first time.
// 'end' is a magic symbol automatically generated by the linker,
// which points to the end of the kernel's bss segment:
// the first virtual address that the linker did *not* assign
// to any kernel code or global variables.
if (!nextfree) {
extern char end[];
nextfree = ROUNDUP((char *) end, PGSIZE);
}

// Allocate a chunk large enough to hold 'n' bytes, then update
// nextfree. Make sure nextfree is kept aligned
// to a multiple of PGSIZE.
//
if ((uint32_t)nextfree - KERNBASE > (npages * PGSIZE - n)) {
panic("out of memory");
}
result = nextfree;
nextfree = ROUNDUP(nextfree + n, PGSIZE);
return result;
}

In mem_init(), add code to allocate an array of npages struct PageInfos and store it in pages. The kernel uses this array to keep track of physical pages.

1
2
pages = (struct PageInfo *) boot_alloc(npages * sizeof(struct PageInfo));
memset(pages, 0, npages * sizeof(struct PageInfo));

page_init() is used to initialize the page structure and memory free list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
void
page_init(void)
{
size_t i;

// page 0
pages[0].pp_ref = 1;
pages[0].pp_link = NULL;

// the rest of base memory
for (i = 1; i < npages_basemem; i++) {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}

// IO hole
for (i = IOPHYSMEM >> 12; i < EXTPHYSMEM >> 12; i++) {
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}

extern char end[];
physaddr_t end_of_kern = PADDR(ROUNDUP((char *) end, PGSIZE));
int size_of_table = ROUNDUP(npages * sizeof(struct PageInfo), PGSIZE) >> 12;
int end_of_inuse_page = (end_of_kern >> 12) + 1 + size_of_table;

// in use extended memory, including kernel, kernpgdir and pages array
for (i = EXTPHYSMEM >> 12; i < end_of_inuse_page; i++) {
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}

// free extended memory
for (i = end_of_inuse_page; i < npages; i++) {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}

page_alloc() is used to allocate a physical page from the free list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct PageInfo *
page_alloc(int alloc_flags)
{
if (page_free_list == NULL)
return NULL;

struct PageInfo *victim = page_free_list;

page_free_list = victim->pp_link;
victim->pp_link = NULL;

if (alloc_flags & ALLOC_ZERO)
memset(page2kva(victim), 0, PGSIZE);

return victim;
}

page_free() is used to free a physical page and put it into the free list. We need to check whether the page is still in use (using pp_ref) or the page has already been freed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void
page_free(struct PageInfo *pp)
{
// Fill this function in
// Hint: You may want to panic if pp->pp_ref is nonzero or
// pp->pp_link is not NULL.
if (pp->pp_ref)
panic("reference count non zero");

if (pp->pp_link)
panic("double free detected");

pp->pp_link = page_free_list;
page_free_list = pp;
}

Part 2: Virtual Memory

Virtual, Linear, and Physical Addresses

In x86 terminology, a virtual address consists of a segment selector and an offset within the segment. A linear address is what you get after segment translation but before page translation. A physical address is what you finally get after both segment and page translation and what ultimately goes out on the hardware bus to your RAM.

In out inplementation, the GDT installed in boot/boot.s actually disabled segment translation by setting all segment base addresses to 0 and limits to 0xffffffff. Hence the “selector” has no effect and the linear address always equals the offset of the virtual address.

In the previous lab, we installed a simple page table to map only 4MB of physical memory at 0x00100000 to the virtual space starting at 0xf0100000. This is sufficient for the kernel to start up, but we will need to set up the complete kernel space in the following part.

Question 1

C type Address type
T* Virtual
uintptr_t Virtual
physaddr_t Physical

Assuming that the following JOS kernel code is correct, what type should variable x have, uintptr_t or physaddr_t?

1
2
3
4
mystery_t x;
char* value = return_a_pointer();
*value = 10;
x = (mystery_t) value;

The answer is uintptr_t.

Reference Counting

We keep a count of the number of references to each physical page in the pp_ref field of the struct PageInfo corresponding to the physical page.

When this count goes to zero for a physical page, that page can be freed because it is no longer used. In general, this count should be equal to the number of times the physical page appears below UTOP in all page tables (the mappings above UTOP are mostly set up at boot time by the kernel and should never be freed, so there’s no need to reference count them).

We’ll also use it to keep track of the number of pointers we keep to the page directory pages and, in turn, of the number of references the page directories have to page table pages.

Page Table Management

Exercise 4

Implement the following functions

1
2
3
4
5
pgdir_walk()
boot_map_region()
page_lookup()
page_remove()
page_insert()

pgdir_walk() tries to find the relevant page table entry in the page directory for a given virtual address. If create equals to 1, it will create the corresponding page table if it does not exist. The function returns a pointer to the found page table entry.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
pde_t pde = pgdir[PDX(va)];
pte_t *pgtbl_ptr;
pte_t *pte_ptr;

if (pde == 0) {
if (create == false)
return NULL;

struct PageInfo *new_pgtbl_info = page_alloc(0);
if (new_pgtbl_info == NULL)
return NULL;

new_pgtbl_info->pp_ref++;
memset(page2kva(new_pgtbl_info), 0, PGSIZE);

pde = pgdir[PDX(va)] = page2pa(new_pgtbl_info) | PTE_W | PTE_P | PTE_U;
}

pgtbl_ptr = (pte_t *) KADDR(PTE_ADDR(pde));
return (pgtbl_ptr + PTX(va));
}

boot_map_region() maps [va, va+size) of virtual address space to physical [pa, pa+size) in the page table rooted at pgdir. It’s intended to set up the static mappings above UTOP, so pp_ref field of the mapped pages isn’t changed.

1
2
3
4
5
6
7
8
9
10
11
12
13
static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
// mind the alignment
int cnt = size / PGSIZE;
if (size % PGSIZE) cnt++;
for (int i = 0; i < cnt; ++i) {
pte_t *pte_ptr = pgdir_walk(pgdir, (const void *)va, 1);
(*pte_ptr) = pa | perm | PTE_P;
va += PGSIZE;
pa += PGSIZE;
}
}

page_lookup() returns the page mapped at virtual address ‘va’. If pte_store is not zero, then we store in it the address of the pte for this page.

1
2
3
4
5
6
7
8
9
10
11
12
13
struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
pte_t *pte_ptr = pgdir_walk(pgdir, va, 0);

if (pte_ptr != NULL && (*pte_ptr) != 0) {
if (pte_store != NULL)
(*pte_store) = pte_ptr;
return pa2page(PTE_ADDR(*pte_ptr));
}

return NULL;
}

page_remove() unmaps the page at virtual address ‘va’. It needs to call page_decref() and tlb_invalidate().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void
page_remove(pde_t *pgdir, void *va)
{
pte_t *pte_ptr = NULL;
struct PageInfo *page = page_lookup(pgdir, va, &pte_ptr);

if (pte_ptr != NULL) {
if ((*pte_ptr) != 0) {
(*pte_ptr) = 0;
tlb_invalidate(pgdir, va);
}
page_decref(page);
}
}

page_insert() maps the physical page pp at virtual address va. If there is already a page mapped at va, it will be removed using page_remove(). And pp_ref should be incremented it the insertion succeeds.
There is a corner case which requires attention. If the same pp is re-inserted at the same virtual address in the same pgdir, if we call page_remove() before incrementing the pp_ref, the page may be freed because of the call to page_decref() in page_remove().
By incrementing pp_ref before calling page_remove(), we can elegantly handle the corner case.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
pte_t *pte_ptr = pgdir_walk(pgdir, va, 1);

if (pte_ptr == NULL)
return -E_NO_MEM;

// increment pp_ref before page_remove() to handle corner case
pp->pp_ref++;

if ((*pte_ptr) != 0)
page_remove(pgdir, va);

(*pte_ptr) = page2pa(pp) | perm | PTE_P;

return 0;
}

Part 3: Kernel Address Space

The virtual space layout is like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/*
* Virtual memory map: Permissions
* kernel/user
*
* 4 Gig --------> +------------------------------+
* | | RW/--
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* : . :
* : . :
* : . :
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
* | | RW/--
* | Remapped Physical Memory | RW/--
* | | RW/--
* KERNBASE, ----> +------------------------------+ 0xf0000000 --+
* KSTACKTOP | CPU0's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| |
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* | CPU1's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| PTSIZE
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* : . : |
* : . : |
* MMIOLIM ------> +------------------------------+ 0xefc00000 --+
* | Memory-mapped I/O | RW/-- PTSIZE
* ULIM, MMIOBASE --> +------------------------------+ 0xef800000
* | Cur. Page Table (User R-) | R-/R- PTSIZE
* UVPT ----> +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000
* | RO ENVS | R-/R- PTSIZE
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
* PFTEMP -------> | Empty Memory (*) | PTSIZE
* | |
* UTEMP --------> +------------------------------+ 0x00400000 --+
* | Empty Memory (*) | |
* | - - - - - - - - - - - - - - -| |
* | User STAB Data (optional) | PTSIZE
* USTABDATA ----> +------------------------------+ 0x00200000 |
* | Empty Memory (*) | |
* 0 ------------> +------------------------------+ --+
*
* (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
* "Empty Memory" is normally unmapped, but user programs may map pages
* there if desired. JOS user programs map pages temporarily at UTEMP.
*/

Permissions and Fault Isolation

We use the permission bits in the page table to control access and implement page-level isolation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Page Directory Entry     Page Table Entry      Combined Protection
U/S R/W U/S R/W U/S R/W

S-0 R-0 S-0 R-0 S x
S-0 R-0 S-0 W-1 S x
S-0 R-0 U-1 R-0 S x
S-0 R-0 U-1 W-1 S x
S-0 W-1 S-0 R-0 S x
S-0 W-1 S-0 W-1 S x
S-0 W-1 U-1 R-0 S x
S-0 W-1 U-1 W-1 S x
U-1 R-0 S-0 R-0 S x
U-1 R-0 S-0 W-1 S x
U-1 R-0 U-1 R-0 U R
U-1 R-0 U-1 W-1 U R
U-1 W-1 S-0 R-0 S x
U-1 W-1 S-0 W-1 S x
U-1 W-1 U-1 R-0 U R
U-1 W-1 U-1 W-1 U W

Initializing the Kernel Address Space

Now we use boot_map_region which we just implemented to initialize the kernel address space above UTOP.

Exercise 5

Fill in the missing code in mem_init() after the call to check_page().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
//////////////////////////////////////////////////////////////////////
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
// - the new image at UPAGES -- kernel R, user R
// (ie. perm = PTE_U | PTE_P)
// - pages itself -- kernel RW, user NONE
boot_map_region(kern_pgdir, UPAGES, PTSIZE, PADDR(pages), PTE_U);

//////////////////////////////////////////////////////////////////////
// Use the physical memory that 'bootstack' refers to as the kernel
// stack. The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
// * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
// * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
// the kernel overflows its stack, it will fault rather than
// overwrite memory. Known as a "guard page".
// Permissions: kernel RW, user NONE
boot_map_region(kern_pgdir, KSTACKTOP - KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);

//////////////////////////////////////////////////////////////////////
// Map all of physical memory at KERNBASE.
// Ie. the VA range [KERNBASE, 2^32) should map to
// the PA range [0, 2^32 - KERNBASE)
// We might not have 2^32 - KERNBASE bytes of physical memory, but
// we just set up the mapping anyway.
// Permissions: kernel RW, user NONE
boot_map_region(kern_pgdir, KERNBASE, 0x0fffffff, 0, PTE_W);

Question 2

  • What entries (rows) in the page directory have been filled in at this point?
Entry Base Virtual Address Points to (logically)
1023 0xffc00000 Page table for top 4MB of phys memory
1022 0xff800000
960 0xf0000000 Page table for bottom 4MB of phys memory
959 0xefff8000 Page table for kernel stack
957 0xef400000 PD itself as a page table
956 0xef000000 Page table of pages mapped at UPAGES
2 ? ?
1 ? ?
0 ? ?
  • We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel’s memory? What specific mechanisms protect the kernel memory?

    • Page level protection. The U/S bit in PTEs.
  • What is the maximum amount of physical memory that this operating system can support? Why?

    • 2GB. npages = 524288, max amount of physical memory = npages * 4KB = 2GB.
    1
    pages = (struct PageInfo *) boot_alloc(npages * sizeof(struct PageInfo));
  • How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down?

    • Size of pages = npages * sizeof(struct PageInfo);
    • Size of kern_pgdir = PGSIZE
    • Total size of all page tables = PGSIZE * 1K
    • Revisit the page table setup in kern/entry.S and kern/entrypgdir.c. Immediately after we turn on paging, EIP is still a low number (a little over 1MB). At what point do we transition to running at an EIP above KERNBASE? What makes it possible for us to continue executing at a low EIP between when we enable paging and when we begin running at an EIP above KERNBASE? Why is this transition necessary?
      • jmp *%eax does the trasition. Because we set up a entry_pgdir to map virtual addresses [KERNBASE, KERNBASE+4MB) to physical addresses [0, 4MB). Because we need kernel to run at the virtual space above KERNBASE, leaving the virtual space below for user process.

Challenge

Extend the JOS kernel monitor with commands showmappings, setpermbits and dumpmem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
static inline char*
pageperm2str(pte_t pte, char *buf)
{
int i;
static const char *str[] = { "_________SR_", "AVLGPDACTUWP" };

for (i = 0; i < 12; i++)
buf[i] = str[pte >> (11 - i) & 0x1][i];
buf[i] = '\0';

return buf;
}

static inline int
str2pageperm(const char *buf)
{
int pri = 0;
while (*buf != '\0') {
switch (*buf++) {
case 'p':
case 'P':
pri |= PTE_P;
break;
case 'w':
case 'W':
pri |= PTE_W;
break;
case 'u':
case 'U':
pri |= PTE_U;
break;
case 't':
case 'T':
pri |= PTE_PWT;
break;
case 'c':
case 'C':
pri |= PTE_PCD;
break;
case 'a':
case 'A':
pri |= PTE_A;
break;
case 'd':
case 'D':
pri |= PTE_D;
break;
case 'g':
case 'G':
pri |= PTE_G;
break;
default:
break;
}
}
return pri;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
int
mon_showmappings(int argc, char **argv, struct Trapframe *tf)
{
if (argc != 3 && argc != 2) {
cprintf("Wrong number of arguments.\n");
cprintf("Usage: showmappings begin_vaddr [end_vaddr]\n");
return 1;
}

int begin, end;
struct PageInfo *pp;
pte_t *pptr;
char buf[15];

begin = ROUNDDOWN((uint32_t)strtol(argv[1], NULL, 16), PGSIZE);
if (argc == 2)
end = begin;
else
end = ROUNDDOWN((uint32_t)strtol(argv[2], NULL, 16), PGSIZE);

if (begin > 0xf7ffffff || end > 0xf7ffffff) {
cprintf("Virtual Address out of range.\n");
return 1;
}

cprintf("%8s\t%8s\t%12s\n", "VA", "PA", "PERMBITS");
for (; begin <= end; begin += PGSIZE) {
pp = page_lookup(kern_pgdir, (void *)begin, &pptr);
if (pp != NULL) {
pageperm2str(*pptr, buf);
cprintf("%08x\t%08x\t%12s\n", begin, page2pa(pp), buf);
}
else
cprintf("%08x\t%8s\t%12s\n", begin, "none", "none", "none");
}
return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
int
mon_setpermbits(int argc, char **argv, struct Trapframe *tf) {
if (argc != 3) {
cprintf("Wrong number of arguments.\n");
cprintf("Usage: setpermbits +/-permbits vaddr\n");
return 1;
}

int begin;
struct PageInfo *pp;
pte_t *pptr, pte;
char buf_old[15], buf_new[15];

begin = ROUNDDOWN((uint32_t) strtol(argv[2], NULL, 0), PGSIZE);

if (begin > 0xf7ffffff) {
cprintf("Virtual Address out of range.\n");
return 1;
}

pp = page_lookup(kern_pgdir, (void *)begin, &pptr);

if (pp == NULL) {
cprintf("No mpping exists.\n");
return 1;
}

pte = *pptr;
if (*argv[1] == '+')
*pptr |= str2pageperm(argv[1] + 1);
else if (*argv[1] == '-')
*pptr &= ~str2pageperm(argv[1] + 1);

cprintf("Virtual\t\tPhysical\tOld Priority\tNew Priority\t\n");
cprintf("%08x\t%08x\t%12s\t%12s\n", begin, page2pa(pp), pageperm2str(pte, buf_old), pageperm2str(*pptr, buf_new));

return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
int mon_dumpmem(int argc, char **argv, struct Trapframe *tf) {
if (argc != 4) {
cprintf("Wrong number of arguments.\n");
cprintf("Usage: dumpmem -p/v begin_addr end_addr\n");
return 1;
}

int begin, end;

begin = (uint32_t) strtol(argv[2], NULL, 0);
end = (uint32_t) strtol(argv[3], NULL, 0);

if (argv[1][1] == 'p') {
if ((PGNUM(begin) >= npages) || (PGNUM(end) >= npages)) {
cprintf("Physical address out of range.\n");
return 1;
}
begin = (uint32_t) KADDR(begin);
end = (uint32_t) KADDR(end);
}

if (begin >= 0xf7ffffff || end >= 0xf7ffffff) {
cprintf("Virtual address out of range.\n");
return 1;
}

cprintf("Virtual\t\tPhysical\tMemory Contents\n");
while (begin <= end) {
int i;
pte_t *pptr;

cprintf("%08x\t", begin);
if (page_lookup(kern_pgdir, (void *) begin, &pptr) == NULL || *pptr == 0) {
cprintf("No Mapping\n");
begin += PGSIZE - begin % PGSIZE;
continue;
}

cprintf("%08x\t", PTE_ADDR(*pptr) | PGOFF(begin));

for (i = 0; i < 16; i++, begin++)
cprintf("%02x ", *(unsigned char *) begin);

cprintf("\n");
}

return 0;
}