Skip to content
← Back to Community
The Basics of x86
Profile icon
[deleted]

Introduction

x86 is one of the most common computer architectures in use today. Many personal computers and devices use either x86 or ARM CPUS. But what does that mean? What is the difference?

The Instruction Set

As you might have heard before, the way code is run at the lowest level is machine code. Machine code is made up of individual bytes known as opcodes. Each opcode is followed by one to two arguments. In essence, an opcode is just a function call. The set of all the opcodes a machine has is called its instruction set. x86 has a very big instruction set, comprised of thousands of opcodes. ARM is what is called a RISC, or reduced instruction set computer. That means its instruction set is smaller than normal: only about 300 instructions. That is why it is used for embedded devices.

Registers, Math, and the Stack

But what do all these opcodes actually do? Some of the most common ones are for manipulating registers and the stack. A register similar to a variable: it just stores a value, and you can manipulate it. There are many registers in x86. First, there are the general purpose registers: eax, ebx, ecx, edx, edi, and esi, as well as r0 through r9. You can store whatever data you want in these registers, and no one will complain. You might wonder what the names mean. The answer is that they indicate the size in bits of the register. If a register is named <x>h or <x>l, it is 8 bits. If it is named <x>x, it is 16 bits. If it is named e<x>x, it is 32 bits. Finally, if it is named r<x>x, it is 64 bits. x86 also has a stack. You have probably heard of a stack: you can push and pop to it, but only pop the top item. The stack is handled by two registers: ebp and esp. The stack on x86 grows downward: when you push an item, the top of the stack moves down one address. The stack is just a region of memory. The registers are used to manipulate it: ebp points to the bottom of the stack, and esp to the top. The stack is how C stores variables: you might have one variable at esp - 8, another at esp - 16, and another at esp. The stack is handled by two instructions: push and pop. The are pretty self-explanatory: push pushes an item, and pop pops it.

OSes and the Boot Process

Well that's all well and good, but what actually happens on boot up? If I turn on my computer, what does it do? How does it load my OS? What even is my OS? All those questions are covered by something called modes and the boot process. We will answer the last question first: an OS is made of a couple of parts: the bootloader and the kernel. The bootloader is what it sounds like: it is booted, and it loads the rest of the OS. The rest of the OS is called the kernel, and it is in charge of pretty much everything: file system, drivers, userspace and everything. On most Linux systems, the bootloader is GRUB or Syslinux. Bootloaders are a complex topic, and I won't go into them here. Now we will get to the actual boot process: how does the computer load the bootloader? When the computer turns on, it loads something called the BIOS, or Basic Input Output System. The BIOS loads 1 sector, or 512 byte group, from the boot disk to the address 0x7c00, and then jumps there. This is why bootloaders are necessary: the BIOS only reads 1 sector, so it is necessary to load more to have a legitimate OS. Most bootloaders do not do this immediately, though: they will load a second stage bootloader, which will load the kernel.

I/O and Interrupts

But how does the BIOS load the bootsector? It doesn't just automatically: it has to use what is called an I/O port or MMIO to interact with the hard disk. An I/O port is just like a network port, but on the computer's hardware: You can read and write to it, and each port has its own particular function. There is one for reading the mouse, another for the hard drive, and another for the graphics card. MMIO, or Memory Mapped Input Output, is another way I/O is done. The way it works is that the device will specify an address for data to be written to it, and read data from that memory address. For example, a VGA card will read data from the address 0xb8000 and write it to the screen. However, most things do not interface with the hardware directly like the BIOS: there are so many types of devices that to cover each one would use up the whole boot sector! Instead, an OS will use what is called a BIOS interrupt. These are done with the int instruction. An interrupt basically tells the computer that the OS wants to do something, and the computer will find the code the BIOS to set up to do that. For example, the BIOS sets up interrupt 0x13 to perform hard disk services. Thus, to interface with the hard drive, I would simply call int 0x13, and the computer would do that I wanted it to for me.

Modes and Memory

Well, BIOS interrupts are great, but they come at a price. When the computer boots up, it is in what is called Real mode or 16-bit mode: You have access to every aspect of the computer with no protections, and you can use BIOS interrupts. The problem is with memory access. The name 16-bit mode comes from the fact that you can only access 16-bit addresses: 0 to 65536. As you might expect, this is a problem: what if an app needs more than 65536 bytes of RAM? The solution is called Protected mode pr 32-bit mode. In Protected mode, you have access to 32-bit addresses, letting you read and write 4 GiB of RAM. You can also set privileges on it to only allow certain processes to access it. The price of this is that you cannot use BIOS interrupts anymore. Many older operating systems like DOS run in Real mode, but modern ones run in Protected mode or Long mode, which allows you to access 64-bit addresses.

Segmentation, Paging, and the GDT

Why is it that Real mode OSes can only use 16-bit addresses? The reason is that they use something called segmentation to access memory. Each area of memory is divided into a segment. You then access offsets within the segment. For example, you might have the segment 0x1000 and the offset 0x1234, giving you the address 0x1000:0x1234. To convert this to a physical address, we shift the segment left 4 bits and add the offset to that. By applying this, we get (0x1000 << 4) + 0x1234 = 0x10000 + 0x1234 = 0x11234. The computer handles segments using segment registers: cs, ds, ss, es, fs, and gs. These are, respectively: the code segment, where the program runs, the data segment, where data is stored, the stack segment, for the stack, the extended segment, for user use, and two useless segments. Protected mode does things differently. Because of the size of a paging physical address, we can only access addresses that will fit in a segment:offset address. When you enter protected mode, you must set up something called a GDT, or Global Descriptor Table. This is where you define how you want to access memory: for example, you could set up your OS to only use addresses 0x10000 to 0x90000. Once in protected mode, you do not use segmentation anymore: you must set up something called paging. In paging, you use what is called virtual memory. This allows you to have each process think that it has access to all 4 GiB of memory and that it runs at 0x0000, even if it does not. In paging, virtual and physical memory is divided into pages. Each page in virtual memory is mapped to a page in physical memory. A list of all the pages and their mappings is stored in a page table, which is in turn stored in a page directory where the computer can access it.

A Final Note: Emulators

If you want to write you own OS for x86, you will want to learn more about assembly language, the human readable form of machine code, and C. Then, you will need to write a bootloader and kernel, or set up a kernel to boot with GRUB or Syslinux. Finally, you will have a bootable image, which is basically just a file containing you OS as machine code. What do you do know? You need to know if you OS works! You could write the image to a flash drive, reboot your computer, and boot from the flash drive. But this is messy and wastes time. Instead, most OS developers use an emulator for testing. An emulator is simply a virtual machine that runs you OS: it emulates the x86 instruction set, so your OS thinks it is running on real hardware, when in reality it is just running as an app. Two good emulators are QEMU and VirtualBox. VirtualBox is geared more towards those who want to run other OSes on their machine, say run Linux without installing it. It is easiest to use QEMU to test your OS in development.

The End

This concludes my tutorial on x86. If you liked it, be sure to check back later, as I plan to write other tutorials on Python and perhaps even writing a basic OS. Thanks for reading!

Voters
Profile icon
MartinStaab
Profile icon
MrSnippy
Profile icon
mgsium
Profile icon
programmeruser
Profile icon
JodyJohnston
Profile icon
ApoorvSingal
Profile icon
pyoverflow
Profile icon
chazwhich
Profile icon
CristianMartin
Profile icon
Coder_Man
Comments
hotnewtop
Profile icon
MrEconomical

insane tutorial

Profile icon
[deleted]
Profile icon
amasad

@sugarfi this is fantastic. I just gave you the content creator badge -- great work!
Screen Shot 2020-02-04 at 2.37.37 PM

Profile icon
[deleted]

@amasad thanks a lot for the badge! Glad you liked my tutorial!

Profile icon
Highwayman

@V3rmillionNet
F. Ok now someone really does have to make a replOS.

Profile icon
[deleted]

@Highwayman I tried, but QEMU doesn't run on repl.it

Profile icon
Highwayman

@sugarfi I mean, repl.it doesn’t necessarily have to be the ide you develop it on, it can just be the site upon which you distribute the code. It’d just be geared towards repl.it developers I guess. But I guess we’ll just have to go for a replShell. 😞 that’s too bad.

Profile icon
[deleted]

@Highwayman I have tried writing an OS before, but I always gave up or got bored before it went anywhere. I am working on another one now, though

Profile icon
Highwayman

@sugarfi tbh I’d probably get bored of it even if I had the ability lol, but cool. What do you think it’ll be like?(your os)

Profile icon
[deleted]

@Highwayman I am aiming for a DOS or Unix like thing, with a basic command line and app support

Profile icon
Highwayman

@sugarfi "app support" ?

Profile icon
[deleted]

@Highwayman you could download and run apps

Profile icon
Highwayman

@sugarfi oh lol my brain how did I not get that.

Profile icon
Shigetorum

@Highwayman damn, i hate the fact that internet is able of teaching us anything

Anyways time to code a os

Profile icon
Highwayman
Profile icon
[deleted]
Profile icon
Highwayman

@sugarfi lol 01011000 01000100

Profile icon
[deleted]

@Highwayman 170 104 base 8

Profile icon
Highwayman

@sugarfi
-..-
-..
I was gonna do that one earlier, but I wasn’t sure how to make it clear that it was base 8 lol.

Profile icon
[deleted]
Profile icon
Highwayman

@sugarfi what’s that using? Never seen that before.

Profile icon
[deleted]

@Highwayman that's base64.

Profile icon
Highwayman

@sugarfi oh lol, that’s actually the one I was gonna do. hm... this is hard finding new encodings...hm.. this might take a bit lol I think I’ve lost 😂

Profile icon
Highwayman

@sugarfi O.o
I searched for an hr. Wat. Ok I did lose lol.

Profile icon
[deleted]

@Highwayman you could still use base 32, or better yet base 69....

Profile icon
Highwayman

@sugarfi lol 69. I saw a post on SO the other day now that I think about it that was just about reducing the amount of space raw binary would take up when encoded. The problem was everything else was horribly slow, so I think eventually they just went with base64.
I think if I can figure out base 32....

Profile icon
Highwayman

@sugarfi nope. I just don’t get how they work 😓

Profile icon
[deleted]

@Highwayman just use python int(base=32)

Profile icon
staticvoidliam7

im working on one kinda @Highwayman

Profile icon
Highwayman
Profile icon
staticvoidliam7

can i just link it? @Highwayman

Profile icon
Highwayman

@LiamDonohue yes absolutely! :)

Profile icon
[deleted]

@LiamDonohue cool, i would love to see what you have so far!

Profile icon
staticvoidliam7
Profile icon
Highwayman

@LiamDonohue this seems to be a shell more than an os?

Profile icon
staticvoidliam7

yeah, that's why I said kinda because it's kinda a weird breed lol @Highwayman

Profile icon
[deleted]

@Highwayman @LiamDonohue yeah, this is not a bootable program, it is a console app

Profile icon
staticvoidliam7

it's kinda an example of what I might make for an OS @sugarfi

Profile icon
[deleted]

@LiamDonohue oh, cool

Profile icon
staticvoidliam7

@highwayman @sugarfi do yall want to get on a multiplayer and start one? lol also try out https://repl-mail.mreconomical.repl.co/mail

Profile icon
staticvoidliam7

@LiamDonohue btw not my website

Profile icon
Highwayman

@LiamDonohue I’m not good at collab, strange as that sounds.

Profile icon
staticvoidliam7
Profile icon
staticvoidliam7

do you have a repl mail accnt? im curious how many people do @Highwayman

Profile icon
Highwayman

@LiamDonohue I would, but repl mail’s blocked for me :(

Profile icon
staticvoidliam7

? how is repl.co blocked? @Highwayman

Profile icon
Highwayman

@LiamDonohue everything is blocked for me except for repl.it. Maybe if I can figure out if they also host the servers under the repl.it domain....

Profile icon
Highwayman

@Highwayman well actually not everything but most tgings

Profile icon
staticvoidliam7

ahh ok pretty much the same with me I have to request unblocks from the administrator @Highwayman

Profile icon
Highwayman

@LiamDonohue same. Except my admin is my mother, so that never gets anywhere lol.

Profile icon
staticvoidliam7

lol mine is my father so it's a bit different @Highwayman

Profile icon
Highwayman

@LiamDonohue XS I hate whitelist restrictions :(

Profile icon
staticvoidliam7
Profile icon
staticvoidliam7

I had an idea for a programming language: uwuscript @Highwayman

Profile icon
Highwayman

@LiamDonohue UWU that’d be fun, all owos and uwus and idk something else lol

Profile icon
staticvoidliam7

yeah ima start working on it lol @Highwayman

Profile icon
Highwayman

@LiamDonohue I’ve just been trying to make head or tail of networking C++ rn, so I’m just too dead inside to make up good ideas like that lol. How’s it coming along?

Profile icon
staticvoidliam7

well im working on something called THAIL (Technical High-Level Abstract Language) but i just had the uwuscript idea @Highwayman

Profile icon
staticvoidliam7

here's an example:

UwU 1.0 hewwo "Hello world!"

@Highwayman
(UwUScript)

Profile icon
Highwayman

@LiamDonohue that’s actually nice lol. What’s THAIL? it sounds super cool.

Profile icon
[deleted]

@Highwayman for unblocking websites, sometimes you can use https://replbox.repl.it/data/web_hosting_1/MrEconomical/repl-mail or something similar, but it looks like it doesn't load stylesheets.

Profile icon
Highwayman

@sugarfi it does load style sheets, but it doesn’t host servers. I think. I’m still trying to figure out the servers.. 🤷‍♂️

Profile icon
[deleted]

@Highwayman what do you mean? that is how repl.it does their hosting as far as i know.

Profile icon
Highwayman

@sugarfi well... idk, let me try it again. Maybe it’s under a different web_hosting folder or something. Idk. Let me see.

Profile icon
Highwayman

@sugarfi wow. Yeah not stylesheets and I remember why. You’d have to figure out how to resolve the link for replbox.repl.it too. And everything else doesn’t work too because of all the missing js and links and stuff. Yeesh. It’s terrifying.

Profile icon
staticvoidliam7

a programming langauge im working on, i just have to finish the documentation then i can start coding it @Highwayman

Profile icon
Highwayman

@LiamDonohue oh! Well it sounds super cool :3

Profile icon
staticvoidliam7

ikr? if you would like to help, just tell me @Highwayman

Profile icon
Highwayman

@LiamDonohue :/ I’ve never done that kinda thing before and I’d just end up dragging you down since I’m so bad at collab. thank you for the offer though :)

Profile icon
staticvoidliam7
Profile icon
[deleted]

@LiamDonohue you're making a language too? what kind?

Profile icon
staticvoidliam7

a programming language called THAIL (stands for Technical High-Level Abstract Intermediate Language) @sugarfi

Profile icon
[deleted]
Profile icon
staticvoidliam7

yeah, the idea randomly came to me in class the other day @sugarfi

Profile icon
[deleted]

@LiamDonohue what does the syntax looks like?

Profile icon
staticvoidliam7

so its like a weird combination between Visual Basic, Python, and JavaScript
here's an hello world program

print = "hello".

the period at the end of the statement is required or a syntax error would be thrown
@sugarfi

Profile icon
[deleted]
Profile icon
ApoorvSingal

@sugarfi @Highwayman QEMU works on replit. You just need to manually copy the missing files iirc. Also, @CSharpIsGud already made replos https://repl.it/talk/share/ReplOS-A-REAL-Operating-System-on-replit/30207

Profile icon
[deleted]

@ApoorvSingal their replos is bad though

Profile icon
ApoorvSingal

@sugarfi Maybe but they claimed the name lol.

Profile icon
Highwayman

@ApoorvSingal I don’t see any legal documents or proclamations and even @CSharpIsGud admitted that it was unfinished and broken n’ stuff. I wouldn’t call that a very solid claim..

Profile icon
[deleted]
Profile icon
Skyview

@Highwayman Just use a proxy like Alloy or Corrosion

Profile icon
Highwayman

but with whitelist restrictions i wowuldn't be able to access proxies either. @Skyview

Profile icon
Skyview

@Highwayman oh yeah forgot about that too. Hmm depending on where the location of the whitelist restrictions, you could write a custom DNS that would resolve the allowed sites to a http proxy, then allowing you to access other sites.

Profile icon
Highwayman

... this is just getting increasingly infeasible. @Skyview

Profile icon
RattiSriri

hi

Profile icon
[deleted]

Are you going to do a x86_64 tutorial?

Profile icon
[deleted]

@Ryanand sorry! I don't really know anything about 64 bit, only 16 and 32 bit.

Profile icon
[deleted]

@sugarfi no prob just askin if you would make one

Profile icon
MarcusWeinberger

check out copy.sh

Profile icon
Highwayman

Sick tutorial

Profile icon
[deleted]
Profile icon
Daiga

THANKS!

Profile icon
staticvoidliam7

ok whats the difference between x86 and x64? doesn't windows use x64?

Profile icon
[deleted]

@LiamDonohue windows can use either. I believe that x86 is 32-bit, and x64 is 64-bit.

Profile icon
[deleted]

@LiamDonohue it depends on the hardware.

Profile icon
CSharpIsGud

You may want to reword
Why is it that Real mode OSes can only use 16-bit addresses? The reason is that they use something called segmentation to access memory. - As segmentation lets you use addresses higher than 65536 its just that you have to go through segments instead of directly accessing an address

Profile icon
[deleted]

@CSharpIsGud true, maybe i will have to reword that