MIT 6.828 Lab1
Notes of MIT 6.828 Lab1.
Part 1: PC Bootstrap
The PC’s Physical Address Space
A PC’s physical address space is hard-wired to have the following general layout.
1 | +------------------+ <- 0xFFFFFFFF (4GB) |
The ROM BIOS
The IBM PC starts with CS = 0xf000
and IP = 0xfff0
, executing at physical address 0x000ffff0
, which is at the very top of the 64KB area reserved for the ROM BIOS.
In real mode (the mode that PC starts off in), address translation works according to the formula: physical address = 16 * segment + offset.
The BIOS in a PC is “hard-wired” to the physical address range 0x000f0000-0x000fffff
, this design ensures that the BIOS always gets control of the machine first after power-up or any system restart.
Exercise 2
Generally, the BIOS performs the following tasks
- Sets up an IDT (Interrupt Descriptor Table,中断向量表)
- Initializes various devices such as the VGA display
- After initializing the PCI bus and all the important devices the BIOS knows about, it searches for a bootable device such as a floppy, hard drive, or CD-ROM. When it finds a bootable disk, the BIOS reads the boot loader from the disk and transfers control to it.
Part 2: The Boot Loader
The Boot loader
Floppy and hard disks for PCs are divided into 512 byte regions called sectors. The first sector of a bootable disk is called the boot sector.
When the BIOS finds a bootable floppy or hard disk, it loads the 512-byte boot sector into memory at physical addresses 0x7c00
through 0x7dff
, and then uses a jmp
instruction to set the CS:IP
to 0000:7c00
, passing control to the boot loader. Like the BIOS load address, these addresses are fairly arbitrary - but they are fixed and standardized for PCs.
The boot loader performs the following functions:
First, switch the processor from real mode to 32-bit protected mode, because it is only in this mode that software can access all the memory above 1MB in the processor’s physical address space. The boot loader sets the
$CR0_PE_ON
bit in registercr0
and loads the GDT.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23# Switch from real to protected mode, using a bootstrap GDT
# and segment translation that makes virtual addresses
# identical to their physical addresses, so that the
# effective memory map does not change during the switch.
lgdt gdtdesc
movl %cr0, %eax
orl $CR0_PE_ON, %eax
movl %eax, %cr0
# Jump to next instruction, but in 32-bit code segment.
# Switches processor into 32-bit mode.
ljmp $PROT_MODE_CSEG, $protcseg
# Bootstrap GDT
.p2align 2 # force 4 byte alignment
gdt:
SEG_NULL # null seg
SEG(STA_X|STA_R, 0x0, 0xffffffff) # code seg
SEG(STA_W, 0x0, 0xffffffff) # data seg
gdtdesc:
.word 0x17 # sizeof(gdt) - 1
.long gdt # address gdt
GDT (Global Descriptor Table) is a significant structure in protected mode.
Typically, we access memory address byselector:offset
. In real mode, a selector is an paragraph number of physical memory. In protected mode, a selector value is an intex into a descriptor table, and each segment is assigned an entry in a descriptor table.
The whole system has one Global Descriptor Table. Its entry address and limit of length are hold in registerGDTR
. During protected mode initialization, we have to use the instructionlgdt
to assignGDTR
with a new value.
The structure of GDT entry is as the following. It specifies the base and limit address of the segment and access properties (that’s why it’s called protected mode).
1
2
3
4
5
6
7
8 struc gdt_entry_struct
limit_low: resb 2
base_low: resb 2
base_middle: resb 1
access: resb 1
granularity: resb 1
base_high: resb 1
end struc
Second, the boot loader reads the kernel from the hard disk by directly accessing the IDE disk device registers via the x86’s special I/O instructions.
1
2
3
4
5
6
7
8
9
10
11// load each program segment (ignores ph flags)
ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
eph = ph + ELFHDR->e_phnum;
for (; ph < eph; ph++)
// p_pa is the load address of this segment (as well
// as the physical address)
readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
// call the entry point from the ELF header
// note: does not return!
((void (*)(void)) (ELFHDR->e_entry))();
Exercise 3
- After this instruction
ljmp $PROT_MODE_CSEG, $protcseg
, the processor starts to work in 32-bit mode. The cause is thatljmp
loads the value of$PROT_MODE_CSEG
intoCS
, and the value of$protcseg
intoEIP
. After that, the processor will use the value ofCS
as an index of GDT to find the corresponding segment descriptor. - The last instruction of the boot loader is
call *0x10018
, the first instruction of the kernel ismovw $0x1234,0x472
. - The first instruction of the kernel is in file
kern/entry.s
, at address0x10000c
. - The boot loader reads the
ELFHDR->e_phnum
from kernel’s elf header, and uses theph->p_memsz
to decide how many sectors to read per segment.
Loading the kernel
We can examine the ELF section headers like this:
1 | ❯ objdump -h obj/kern/kernel |
- LMA(load address): The load address of a section is the memory address at which that section should be loaded into memory.
- VMA(link address): The link address of a section is the memory address from which the section expects to execute.
Typically, the link and load addresses are the same.
The boot loader users the ELF program headers to decide how to load the sections. The program headers specify which parts of the ELF object to load into memory and the destination address each should occupy.
We can inspect the program header like this:
1 | ❯ objdump -x obj/kern/kernel |
vaddr
means virtual addresss and paddr
means physical address.
Exercise 5
We can change the link address of boot loader by modifying option -Ttext
boot/Makefrag. For example, 0x7d00
.
After changing this, boot loader will break at ljmp $PROT_MODE_CSEG, $protcseg
because the boot loader is executing from an address (0x7c00
) that is different from the address from which it expects to execute (0x7d00
).
1 | (gdb) si |
Exercise 6
Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel.
The two results are different because the boot loader loads the kernel into that address.
Part 3: The Kernel
Using virtual memory to work around position dependence
As we inspected above, there was a (rather large) disparity between the kernel’s link address (as printed by objdump) and its load address.
Operating system kernels often like to be linked and run at very high virtual address, such as 0xf0100000, in order to leave the lower part of the processor’s virtual address space for user programs to use.
We use the processor’s memory management hardware to map virtual address 0xf0100000 (the link address at which the kernel code expects to run) to physical address 0x00100000 (where the boot loader loaded the kernel into physical memory).
For now, we’ll just map the first 4MB of physical memory, using the hand-written, statically-initialized page directory and page table in kern/entrypgdir.c
.
After kern/entry.S
sets the CR0_PG
flag, paging is enabled, memory references are virtual addresses that get translated by the virtual memory hardware to physical addresses. entry_pgdir
translates virtual addresses in the range 0xf0000000 through 0xf0400000 to physical addresses 0x00000000 through 0x00400000, as well as virtual addresses 0x00000000 through 0x00400000 to physical addresses 0x00000000 through 0x00400000.
Exercise 7
We can examine memory at 0x00100000
and at 0xf0100000
before and after the movl %eax, %cr0
.
1 | => 0x100025: mov %eax,%cr0 |
The first instruction that would fail if the mapping weren’t in place is mov $relocated, %eax
.
Formatted Printing to the Console
The formatted print function int cprintf(const char *fmt, ...)
is implemented in kern/printf.c
, which relys on lib/printfmt.c
and kern/console.c
.
Exercise 8
Complete the code of vprintfmt
in lib/printfmt.c
to print octal numbers using patterns of the form “%o”.
1 | // (unsigned) octal |
The interface between
printf.c
andconsole.c
is functionvoid cputchar(int c)
thatconsole.c
exports.printf.c
calls it instatic void putch(int ch, int *cnt)
, and passes the pointer toputch
as an argument tovprintfmt((void*)putch, &cnt, fmt, ap)
that is called in functionvcprintf
. Andvcprintf
is called by functioncprintf
.The following code is in function
static void cga_putc(int c)
, which is used to output characters to CGA/VGA display.The purpose of this piece of code is to deal with the situation where the position of the cursor exceeds the screen.
crt_pos
is the position of next character to be output, on the other hand, the position of the cursor. AndCRT_SIZE
=CRT_ROWS
*CRT_COLS
, so it means the maximum number of characters that can be displayed on the screen.It moves all the characters on the screen upward by one row (using
memmove
) , and fills the last row with0x0x00 | ' '
, and ends by recalculatingcrt_pos
.1
2
3
4
5
6
7
8
9// What is the purpose of this?
if (crt_pos >= CRT_SIZE) {
int i;
memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
crt_buf[i] = 0x0700 | ' ';
crt_pos -= CRT_COLS;
}fmt
points to the address of format string"x %d, y %x, z %d\n"
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21vcprintf(fmt, ap); // called by cprintf
cons_putc('x');
cons_putc(' ');
va_arg(*ap, int);
// now ap points to &y
cons_putc('1');
cons_putc(',');
cons_putc(' ');
cons_putc('y');
cons_putc(' ');
va_arg(*ap, int);
// now ap points to &z
cons_putc('3');
cons_putc(',');
cons_putc(' ');
cons_putc('z');
cons_putc(' ');
va_arg(*ap, int);
// now ap points to (char *)&z + 4
cons_putc('4');
cons_putc('\n');The output is
He110 World
. In big-endian,i
should be set to0x726c6400
. We don’t need to57616
.The 4 bytes of data on the stack below
3
will be treated as an integer and output.We can change its interface so that when the function is called, arguments are passsed in reverse order.
Challenge: Enhance the console to allow text to be printed in different colors. Left to be done.
The Stack
Exercise 9
The kernel initializes its stack in entry.s
.
1 | # Clear the frame pointer register (EBP) |
We can see the definition of bootstacktop
in the data section of entry.s
, the kernel reserves the space for the stack using .space KSTAKSIZE
.
1 | .data |
According to the result of debug, we know that the stack is located just above entry_pgtable
, at 0x0xf0110000
where esp
points to. And this is the lowest end of the stack.
1 | (gdb) si |
Various x86 instructions, such as call, are “hard-wired” to use the esp
.
The ebp
(base pointer) register, in contrast, is associated with the stack primarily by software convention. On entry to a C function, the function’s prologue code normally saves the previous function’s base pointer by pushing it onto the stack, and then copies the current esp value into ebp
for the duration of the function.
Exercise 10
The address of the test_backtrace
function is 0xf0100040
.
First, it is called by function void i386_init(void)
.
1 | // Test the stack backtrace function (lab 1 only) |
It also recursively calls itself.
1 | cprintf("entering test_backtrace %d\n", x); |
Five 32-bit words are pushed on the stack by each recursive nesting call.
They are saved ebx
, saved esi
, saved ebp
, saved eip
, and an argument x.
Exercise 11
Implement the mon_backtrace()
function to print the stack backtrace, and hook this function into the kernel monitor’s commmand list.
1 | static struct Command commands[] = { |
1 | K> backtrace |
Exercise 12
Modify the stack backtrace function to display, for each eip, the function name, source file name, and line number corresponding to that eip.
In kern/kernel.ld
, we can find the code that include debugging information in kernel memory.
1 | /* Include debugging information in kernel memory */ |
In the result of objdump -h obj/kern/kernel
, we can find information of .stab
and .stabstr
in the sections of the kernel.
1 | 2 .stab 00004375 f010229c 0010229c 0000329c 2**2 |
In the result of objdump -G obj/kern/kernel
, we can examine the contents of the .stab section, that are all the symbols.
1 | ❯ objdump -G obj/kern/kernel |
Then we run gcc -pipe -nostdinc -O2 -fno-builtin -I. -MD -Wall -Wno-format -DJOS_KERNEL -gstabs -c -S kern/init.c
, and look at the init.s
.
We can see the contents of the .stabs
.
1 | .Ltext0: |
So the bootloader loads the symbol table in memory as part of loading the kernel binary, and debuginfo_eip
reads it.
We add this piece of code in debuginfo_eip
to find the line number.
1 | // Search within [lline, rline] for the line number stab. |
And we modify the function mon_backtrace
like this:
1 | int |
1 | K> backtrace |
Now, finally, lab1 is finished. Run make grade
to see the grades.
1 | ❯ make grade |