Learn to Code via Tutorials on Repl.it!

← Back to all posts
16-bit OSDev, on Repl and Linux, part 1
h
MocaCDeveloper (713)

Hi!

In this tutorial, we will be going over development of 16-bit OSDev on Repl as well as on Linux.

Setup

Now, in the last tutorial I had mentioned that we could not use the TCC compiler on Linux(or Repl), however, I have found a way to do so!

But first, before all, lets create a new Repl so we can actually write, and run, an OS. Repl has support for NIX, which is, very briefly, just a packager, in our use case, anyway.
So, lets go ahead and create a new Nix Repl, and within the .nix file do the following to allow us to have access to running qemu:

{ pkgs }: {
	deps = [
		pkgs.qemu
	];
}

I don't know too much about Nix, however, this seemed to work fine.

Now, to install TCC, I will go ahead and give you the download directly here.
After the download is done, go ahead and import it into your repl, and within the shell, type tar xjf tcc-0.9.27.tar.bz2.
This will thus create a folder 'tcc-0.9.27'. Go ahead and enter this folder, and type ./configure. After that is done, type make. This will then create the TCC executable. Type chmod +x tcc. Now, the executable is just inside this folder, what use does that have? Lets go ahead and move it outside the folder so we have access to using it throughout our program. Go ahead and type mv tcc /home/runner/NameOfRepl. This will move tcc to the parent directory.

Linux

Now, for Linux, lets go ahead and install qemu by typing sudo apt-get install qemu. This should automatically root the command accordingly. Then, to install nasm, type sudo apt-get install nasm.
Now, to install TCC, we'll have the same steps as we had for Repl. Go ahead and download the TCC compiler, and within your Linux machine type tar xjf tcc-0.9.27.tar.bz2. This will "dump" all the files into the folder tcc-0.9.27. Go ahead and enter the folder, and type ./configure. Then, type make. Now, this will create the executable tcc. Lets go ahead and root this by typing sudo mv tcc /usr/bin. By doing this, we will then have access to the executable tcc throughout our Linux machine.

The bootloader

First, before all, what is a bootloader? The bootloader is the first program of any OS. It creates what is called the boot sector, as well as aligns the "magic number" that BIOS expects to see accordingly.

Now, what is a boot sector? A boot sector is a simple program that dwells(or resides) in the first 512 bytes of the application. Why just 512 bytes though?
BIOS expects the "magic number" to be at the last 2 bytes of the 512 bytes. This "magic number" tells the BIOS that this is, indeed, a boot sector.
But, why such a small amount? Well, the boot sector is to reside in the first sector of a floppy(or hard disk). Thusly, each sector has 512 bytes, exact. Now, although this is such a small amount, 512 bytes is really allot for a boot sector, especially when it comes to the fact you can have multiple bootloaders.

But, what does the boot sector do? Very briefly, the boot sector is the key to any OS. The magic number BIOS looks for in the last 2 bytes is key to ANY OS being able to run.
Normally, the first bootloader(or the boot sector) just reads sectors from the disk to enable us more room to write more code. In a case of having a second bootloader, the first 512 bytes(or the boot sector) just loads some sectors from the disk, and then jumps to the second boot loader.

In our case, we will just be working with a singe bootloader, for simplicity. Now, the boot sector is loaded in at the address 0x7C00 by the BIOS. Assembly has the directive org which enables us to tell the assembler the memory segmentation of the program. This is an advantage for us, because without the use of org, we'd have much more assembly to write.

Now, what is this "magic number"? The magic number is 55aa. This is the last two bytes of the first 512 bytes of the boot sector. If these numbers are not found, the BIOS will simply just say "nah" and the OS will fail to boot.

[org 0x7C00]
bits 16

jmp $

times 510 - ($ - $$) db 0
dw 0xaa55

Now, lets look over the code:

  • [org 0x7C00], this tells the assembler that this assembly code is to be loaded at the address 0x7C00
  • bits 16, this tells the assembler that this is 16 bit assembly
  • jmp $, this continuously jumps to the current application(which primarily "halts" the OS)
  • times 510 - ($ - $$) db 0, this pads 510 bytes minus the total size of the current program
  • dw 0xaa55 assigns the last 2 bytes to the "magic number"

Now, why do we just pad 510 bytes subtracted by the total size of the program? Well, very briefly, the last two bytes of the first 512 bytes is to be the "magic number". If we were to pad 512 bytes subtracted by the size of the program, then the last 2 bytes would be zeros, or something other than the magic number.

But wait Moca, you said the magic number was '55aa', why is it 0xaa55 in the code? Nasm, along with any other assembler(I believe), executes in big endian.

What is big endian?
Big endian is simply a binary format. It stores the most significant byte at the smallest memory address, and the least significant byte at the largest.
Why? It makes the binary more human readable, however, if you have no prior knowledge to the big endian format, you'd probably get stumped when you run into '55aa' in the binary output, and 0xaa55 in the code.

Basic setup

Now, we got the basics of a bootloader program down. But, we still need to do a couple of things. First, we have to set all data registers to zero, secondly, we have to setup the stack.

There are a few data registers available to us in 16-bit assembly:

  • dx
  • cx
  • ds
  • ss

Lets go ahead and set these all to zero:

[org 0x7C00]
bits 16

xor ax, ax
mov dx, ax
mov cx, ax
mov ds, ax

jmp $

times 510 - ($ - $$) db 0
dw 0xaa55

Now, you might be wondering, is there really just those select amount of registers? Answer is, no.

  • dx has a higher/lower-bit register (dh, dl)
  • cx has a higher/lower-bit register (ch, cl)
  • ds is the data segment, which just stores the memory address of the current assembly program
  • ss is the stack segment register, which just stores information about the memory segmentation of the stack

Why don't we set ss to zero? Well, simply put it, it's considered "safe" assembly code to assign ss and sp right next to each other.

Lets go ahead and setup the stack.

[org 0x7C00]
bits 16

xor ax, ax
mov dx, ax
mov cx, ax
mov ds, ax

cli
mov bp, 0x7C00
mov ss, ax
mov sp, bp
sti

jmp $

times 510 - ($ - $$) db 0
dw 0xaa55

Now, what do cli and sti do? cli clears the interrupt flag(IF), which just disables us from using interrupts. sti resets the interrupt flag(IF) which enables us to use interrupts.

Lets take a deeper look

We can safely assume that, since we are loaded in at 0x7C00, that setting the base pointer register(bp) to 0x7C00 is ok. And, indeed, it is. Now, setting it to something like 0x100000 is also okay, however, for simplicity, I'll be setting it to 0x7C00.

Now, you might be thinking that the stack grows upwards, however, you'd be surprised.

It rather grows downwards.

Lets talk about the stack

Now, why on earth does the stack grow...downwards?

Lets think of malloc. When we want access to x amount of memory, we allocate it via malloc. Lets apply this concept to that of the stack. Lets say we want..16 bytes of "memory". The base pointer(bp) represents the total amount of "memory" available, the stack pointer(sp) represents the total "memory" used.

So, as mentioned, if we want 16 bytes of "memory", we'd have to thusly assign the base pointer register accordingly, being 0x10. This is where the understanding of a seg fault comes in.

Lets say we indeed do just want 16 bytes of "memory" available to us to "store" stuff on the stack. And we push 4 4 byte values, thus being 16. That's okay! No problem! Now, lets say we put a 1 byte value onto the stack. Oh no! The sp register is now into the negatives...seg fault.

So, for the understanding of why the stack grows downwards, just think of it as "malloc". bp is where the stack is to start "growing" at. bp is the total "allocated" memory allowed to us, meaning, everything stored on the stack is thusly appended underneath the base pointer register.

Along with the stack comes a few directives that you're probably quite familiar with, push and pop.

push stores a value onto the stack, meanwhile, if we want to retrieve(or restore, in other words) a value from the stack, we use pop.

Lets talk BIOS

What is the BIOS? BIOS stands for Basic Input Output System. This has been widely used, and mainly just used, for OS's.

But, what does the BIOS have to offer? Well, the BIOS now only enables us to boot up our OS via the boot sector(and that magic number), it also comes along with several sub routines and interrupts, differing from the basic interrupt 0x10, to more advanced interrupts such as reading from the disk.

In this tutorial, we'll be focusing on just working with the interrupt 0x10. This interrupt is quite powerful, believe it or not. With this interrupt you can do things such as change the graphics mode and print to the screen.

Now, the interrupt 0x10 isn't magical and won't just magically work without the specifications of what you want, where function calling comes in.

The BIOS teletype for printing is 0x0e. Normally, if not always, a function type(or code) is stored in the higher-bit of the ax register. Now, values are stored in a specific register dependable on the function code.
So, ah will store the function code almost always, and in this case, it will store the function code for teletype printing. al will thus contain the value to print.

Lets go ahead and test to see if the stack is working, shall we?

[org 0x7C00]
bits 16

xor ax, ax
mov dx, ax
mov cx, ax
mov ds, ax

cli
mov bp, 0x7C00
mov ss, ax
mov sp, bp
sti

push 'A'
mov ah, 0x0e
mov al, [bp - 2]
int 0x10

jmp $

times 510 - ($ - $$) db 0
dw 0xaa55

In the code above, we added in a few lines of code that did the following:

  • Pushed a 2-byte value onto the stack
  • Assigned the teletype function code 0x0e to ah
  • Assigned the value to be printed to al
  • Called the BIOS interrupt 0x10

Now, bp is set to 0x7C00, and the value pushed onto the stack takes up 2 bytes, which we can then safely assume that bp - 2 will be the value 'A'.

Lets compile!

Now, I forgot this step when setting up the program, and you can name the bootloader file however you please. I preferred to name it boot.s, as that is what I always use when writing a bootloader.

Compilation on Repl

For simplicity, go ahead and create a .sh file. Within the file, do the following to A. create a 16-bit executable of the assembly code and B. run it with qemu:

nasm boot.s -f bin -o boot.bin

qemu-system-i386 boot.bin

Then, lets go ahead and create a .replit file, and I think you get the gist of how things go with a .replit file:

run = 'bash make.sh'(I named my .sh file make.sh).

Compilation on Linux

Now, for simplicity, we're going to be using a bash file. But, instead of a .sh extension, we're going to just name it run, or whatever you want to name it. Within this file, do the same as described above. For me, I have to specify the path of qemu in order for it to run successfully, this may not be the case for you however.

If you want, for the sake of following along with the tutorial, within your terminal just type whereis qemu, copy the very first path, and do as follows:

# filename: run

nasm boot.s -f bin -o boot.bin

qemu-system-i386 -L "/path/to/qemu" boot.bin

And, to enable ourselves to run via ./run, simply type chmod +x run.

Assuming that our assembly code compiles and there is no error, we should see that the value 'A' pushed onto the stack is printed.

Summary

You should now have an insight to the startup of an OS on Repl and your local Linux machine. You should also have a insight on how to print a character in 16-bit assembly.

You should also have all tools needed for this tutorial, that being nasm, qemu and tcc.

In the next tutorial we'll go deeper into BIOS interrupts, how we can print strings using the teletype function 0x0e, and reading sectors off the disk.

Until next time, MocaCDeveloper OUT!

Comments
hotnewtop
17lwinn (0)

This is helping me write an OS more easily, keep on posting! (can you please ping me when another post is out?)

MocaCDeveloper (713)

@17lwinn

I sure can!! I am so happy that this is helping! In the next tutorial I will be going over more advanced concepts of the BIOS as well as different ways you can create and run 16-bit executables on Windows. I might add in a bit more information as well!

Highwayman (1501)

I thought still and cli sounded familiar! I was like "why does this feel kind of strange??" Was that in the Windows tutorial? I may have missed it or something cause I remember being kind of confused.

MocaCDeveloper (713)

@Highwayman

Yes, sti and cli were described in the Windows tutorial, however I did not describe them that well in that tutorial.
All they do is sti - enables the IF(interrupt flag), and cli - disables the IF(interrupt flag)

Highwayman (1501)

Ahh I see, I guess I was just a bit confused before or something then. @MocaCDeveloper

MocaCDeveloper (713)

@Highwayman

I am also writing a full-blown documentation (PDF) over 16-but OSDev. The documentation will go further in depth and have more content. I will make sure to let you have first hands on it when I am done writing it!

Highwayman (1501)

:D OMG YES, Deeply appreciated! @MocaCDeveloper

DynamicSquid (5023)

Could you ping me every time you post something new :)

Infiniti20 (29)

@DynamicSquid same here lol. thatd be awesome

MocaCDeveloper (713)

@DynamicSquid @Infiniti20

Not a problem! I will make sure to ping both of you!

MocaCDeveloper (713)

@DynamicSquid @Infiniti20

I am going to have a tutorial uploaded sometime during the day or tonight!

Infiniti20 (29)

@MocaCDeveloper I'll be sure to check it out!

MocaCDeveloper (713)

@DynamicSquid

I am delaying it to sometime today due to hefty explanations needed within the tutorial. Apologies!

IMayBeMe (542)

I’m not even surprised anymore, these tutorials used amaze me with their quality but now it has just become an expectation that you will post another one

MocaCDeveloper (713)

@IMayBeMe

I do not like being expected to constantly post another tutorial. I am still in the beginning stages of OSDev, I am very simply just uploading tutorials to share with others my journey into OSDev. Although I am just in the beginning stages, I have quite allot of information to share.

But I agree. These tutorials tend to become more of a demand rather than "oh wow! A OSDev tutorial!!"

IMayBeMe (542)

@MocaCDeveloper just to clarify I do not expect these tutorials but rather have lost that shock factor as I have learned that any tutorial from you will have a high quality and in depth explanation of the topic