Annotated Assembly Code for a Boot Record

16/08/18, modified 24/09/18

Below, my notes to help me understand the boot code published here; http://btcbase.org/log/2018-07-06#1832315.. The boot loader is the first code run after the BIOS (512 bytes long, and loaded by the BIOS) and it in turn will load the rest OS / application, switch to 64bit mode and start to execute that code.

1
2
3
4
5
6
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Boot Loader - QEMU Variant
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
        payload_blocks   equ 14        ;; N * 512b blocks to load
        stack_top        equ 0x90000   ;; top of stack
	kernel_offset    equ 0x1000    ;; bottom of kernel

A number of constants is defined, the assembler will replace all occurrences of these names with the values after equ.

9
    	[BITS 16]

All lines after the '[BITS 16]' statement will be compiled for 16-bit intel. The boot process always starts with the processor in "real mode", in this mode all code is supposed to follow the 8086, 16 bit command set.

10
        section .text

Code and data can be compiled into sections, the boot program will be contained in a single section which is labelled with ".text".

11
	jmp     init

First line of actual code, a jump instruction to the body of the code. Between the jmp and the body, some data and utility functions can be defined.

12
13
14
15
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
gdtr:
    	dw	gdt_end - gdt - 1 ; GDT limit
	dw	gdt		  ; GDT base

A definition for a Global Descriptor Table. This particular definition is for an empty table with just one entry. This GDT will not be used and can be removed from the file. A GDT is a simple vector of 64-bit (8 byte) elements. A register will contain the length of the table and a pointer to the table, first 2 bytes encode the length (in bytes, not in elements), second 2 bytes the position. The length in bytes must be decreased by 1.

17
18
19
20
21
22
23
gdt:	times 8 db 0		; null descriptor
gdt_end:
	gdt64		dq 0x0000000000000000
	.code 		equ $ - gdt64
	dq 		0x0020980000000000
	.data 		equ $ - gdt64
	dq 		0x0000900000000000

A definition for a GDT that will be used. The first element is zero (apparently bios programs may expect this), the second is for the code section. The statement on line 18 defines a constant (and is not the same as .code section in assembly or object files), the constant will have a value of 8. The code segment element defines the offset in memory where that segment starts, its' size and some flags. To decode the GDT, label the bytes from right the left starting at 0 and ending at 7. The base, (start address position, in bytes or pages) is constructed from bytes 7, 4, 3, 2, and is a 32 bits value. The size, (number of bytes or pages) is constructed from 0 and 1 and half of 6. The other half of 6 defines the size flags. Byte 5 is used for flags. In the number 0x0020980000000000, base and limit are both zero. The size field is 0x2 or 0b0010, which means this is a 64bit descriptor. The flags field is 0x98, or 0b10011000, from high to low, high bit set == valid entry, 00 == privilege, ring 0, 1 == always set, 1 == executable, 0 == code can be run only in ring level 0, 0 == code segment cannot be read (can never be written to by definition), 0 == accessed bit, will be set by processor. Line 22 + 23 is for the description of a data block with the same base as the code block, this is not a 64bit segment. The flags are, 0x90, or 0b1001000, which means a valid data entry, with ring 0 privilege that grows up and is not writable. The last entry is not an entry in the table but the contents for the GDT register. First a 16bit length in bytes (minus the 1), next the 16bit position of the start of the table. How these flags, bases and lengths work out will hopefully become clear in the memory handling code.

29
30
31
32
33
34
35
36
37
38
DiskPacket:
	db	0x10
	db	0
d_blk:	dw	payload_blocks	; int 13 resets this to # of blocks actually read/written

db_off:	dw	after_me	; memory buffer destination offset
db_seg:	dw	0	        ; memory buffer destination segment

d_lba:	dd	1		; put the lba to read in this spot
	dd	0		; more storage bytes only for big lba's ( > 4 bytes )

The BIOS provides services to the boot program, one of these services is reading sectors from the disk. The service needs a structure filled with the number of sectors to read from the disk (14 in this code), were to put the read data (just after the code that was loaded from the same disk and is now running) and the LBA address (1 is the block just after the boot block).

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
read_sector:
 	mov 	si, DiskPacket		; address of "disk address packet"
	mov 	ah, 0x42		; AL is unused
	mov	dl, [BootDrv]
	or 	dl, 0x80		; drive number 0 (OR the drive # with 0x80)
	int 	0x13
	jc 	bad_disk
	inc	dword [d_lba]
	ret
bad_disk:
        mov     si, disk_sad_msg
        call    print
halt:
        hlt
        jmp halt

The call to read block from the hard disk, the bios will load the first block and put this block at 0x7c00. The other blocks will need to be loaded by the boot code (and will be placed 0x7e00). This is a standard implementation of how to call the bios and load the blocks. This service is activated by the 0x13 interrupt with the AH register set to 0x42 and the DL register set to the boot drive. The service will set the carry flag on any error, and the boot code will then print a message and halt the machine. As for line 47, I have no idea why the word at the d_lba address needs to be increased.

57
58
59
60
61
62
63
64
65
66
67
68
69
70
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Print string at si using bios console
print:
        mov    al, [si]
        inc    si
        or     al, al
        jz     end_print    ; end at NUL
        mov    ah, 0x0e     ; op 0x0e
        mov    bh, 0x00     ; page number
        mov    bl, 0x07     ; color
        int    0x10         ; INT 10 - BIOS print char
        jmp    print
end_print:
        ret

Print characters in a zero delimited buffer one at the time using a bios service.

75
76
77
78
79
data:
        start_msg      db 13, 10, "Loading payload from disk...", 13, 10, 0
	end_msg        db "Running Payload...", 13, 10, 0
        disk_sad_msg   db "Disk Error!", 13, 10, 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Text strings to print, 13 == CR, 10 == LF, 0 is end of string byte

81
	BootDrv        db 0 ; drive that we booted from

Byte to store the number of the boot drive

85
86
87
88
89
init:
        xor     ax, ax
        mov     ds, ax
        mov     es, ax
        mov     ss, ax

Set ax to zero and copy this value into ds (data segment), es (extra segment), ss (stack segment).

90
91
       	mov	bp, 0x9c00  ; init realmode stack
	mov     sp, bp

Setup a stack location, note that this is 8k bytes removed from the start of the boot code. The current minimal OS code is 3.3k so this is far away removed.

The stack is only used for a couple of calls in this boot rom and will not grow down by more than 1 word (the IP pointer will be pushed on the stack).

92
        mov	[BootDrv], dl  ; where we booted from

The bios will fill the lower part of the dx register with the index of the boot drive, store this index in memory

96
97
	mov     si, start_msg
        call    print

Print a start message to the boot screen

98
	call    read_sector

Read the rest of the rom

 99
100
	mov     si, end_msg
        call    print

Print an end message, rom has been read

101
        cli

Clear all status flags

102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
	;; enable a20
	call	a20_loop
	jnz	a20_done
	mov	al, 0xD1
	out	0x64, al
	call	a20_loop
	jnz	a20_done
	mov	al, 0xDF
	out	0x60, al
a20_loop:
	mov	ecx, 0x20000
a20_loop_2:
	jmp 	short a20_c
a20_c:
	in	al, 0x64
	test	al, 0x2
	loopne	a20_loop_2
a20_done:

A internet search for the A20 line in intel processors will inform you on some interesting properties of the intel processors. In short, the 20th address line is disabled at boot and no memory above 1mb can be accessed, to get to 64bit mode the address line has to be enabled. The most standard method to enable the line is to send a message to the keyboard controller and this is done in this code. Strangely the a20_loop code is missing a 'ret' statement after line 118 and even if ret is added, the statements at 104 and 108 will do nothing as the loop will only finish when the Z-flag is not set. The jump at line 114 is get a small delay. At line 110 and extra call to the loop and an unconditional jump to a20_done should be added. The boot rom works, but only because the qemu bios already enables the a20 flag.

123
124
	xor	bx, bx
	mov	es, bx

Build a PML4 page table, first setup registers. I will need to look-up how these page tables work. Set the BX register to 0 and copy this value to the ES. ES should still be zero from the code at line 88 but it maybe that the register was changed in the bios code.

125
	cld

Clear direction pointer, for the following string operations.

126
127
128
	mov	di, 0xA000
	mov	ax, 0xB00F
	stosw

Store the value 0xB00F at address 0xA000 and increase di.

129
130
131
	xor	ax, ax
	mov	cx, 0x07FF
	rep 	stosw

Store the word (2 byte) value 0 for 2047 times, will set 4k bytes to zero.

131
132
133
134
135
136
	rep 	stosw
	mov	ax, 0xC00F
	stosw
	xor	ax, ax
	mov	cx, 0x07FF
	rep 	stosw

The PDP table, start with 0xC00F, repeat zeros

137
138
139
140
141
	mov	ax, 0x018F
	stosw
	xor	ax, ax
	mov	cx, 0x07FF
	rep 	stosw

The PD table, start with 0x018F, repeat zeros

This ends the set-up of the paging tables.

143
144
	mov 	eax, 10100000b		; PAE, PGE
	mov 	cr4, eax

To enable 64bit two bits in the CR4 control register need to be set; (1) the Physical Address Extension (bit 5) when set will enable 36 bit instead of 32 bit addresses and (2) Page Global Enable (bit 7) when set will enable global pages that are maintained for all tasks. The Intel documentation notes that the PG flag (in CR0) must be set first, in this code it will be set after this statement at line 151-153. Note that even in REAL mode the 32bit registers are available.

145
146
	mov 	edx, 0x0000A000		; PML4
	mov 	cr3, edx

The address of the paging table is stored in CR3 (and 0xA000 was used in the setup for the paging tables)

147
148
149
150
	mov 	ecx, 0xC0000080		; EFER.LME
	rdmsr				; long mode!
	or 	eax, 0x00000100
	wrmsr

Change a Model Specific Register, the address of the register must be put in ECX and the value of the register will be put in EAX and EDX. In this case a bit in the MSR IA32_EFER must be set, its' address is 0xC0000080. The bit will enable the IA-32e mode as no flag is set in the Code Segment descriptor bits, the mode will be the so called "compatibility mode". The actual model (64bit or less) will then be determined from the GDT and in the GDT the 64bit flag was set.

151
152
153
	mov	ebx, cr0		; long mode
	or	ebx, 0x80000001		; Paging and protection
	mov	cr0, ebx		; Skip pmode

Enable paging

154
	lgdt	[gdt64.pointer]

The GDT register is loaded, and CPU will use the GDT from now on

155
 	jmp	gdt64.code:longmode     ; CS, 64b seg

A mixed size jmp, nasm implements code for this. As gdt4.code points to a quad word (8 bytes, 64 bits) the jmp is into a 64 bit segment.

156
[BITS 64]

Generate 64 bit code starting from this point

158
159
160
161
162
	;; set up new code/data/stack segments
        mov     ebp, stack_top
	mov     esp, ebp
	extern main
        jmp main

Setup C stack and call main

164
	times	510-($-$$) db 0

Fill up any leftover space with zero bytes but leave out the 2 last bytes

165
166
bootsig:
	dw 0xAA55

All boot sectors end with two bytes,0xAA and 0x55

168
after_me:

Label to use for loading the data from this disk into physical memory.

Leave a Reply