Skip to content
Sign upLog in
← Back to Community

x86 Assembly Tutorial Part 1 (it’s big I promise...)

Profile icon


x86 assembly time.

And yes, I know

made a tutorial on this already but as much as you may or may not be a fan of him I think we can all admit it wasn't great.

I know, I know. "Cycle squeezing" blah blah blah. I don't write here often but I wanted to get something out. I really don't care about the virtual points. For me the upvotes are about who gets to see the tutorial. If it so deeply offends you that I dare release my writing in multiple parts, just click off.

And without further ado, let us begin

Getting setup

To start go here

And delete the already made code.

Computers at a low level

A computer at a low level is just an ALU communicating with memory.

The ALU is a device that can perform arithmetic and logic operations.

The memory contains data and code.

This code is represented as binary(1's and 0's).

The code may look something like this.

0010 - 0101 ^ ^ add-1 instruction memory address

With the first 4 bits(a single 1 or 0) being the operation code or opcode.

The opcode is telling the computer what to do with the rest of the info.

The next 4 bits is the operand.

The operand is data that can be used in the opcode.

Assembly works very similar to this. The first mnemonic is the operand and the following data are the operands.

In assembly we also don't work with variables.

You either work with the stack, registers, pointers, or the heap.

In our journey you will get a newfound understanding of programming. For example, a while loop wont just look like this anymore:

#include <stdio.h> int main() { while (true) { // erm while this statement is true repeat. printf("Hello this is a while loop"); } return 0; }

It will look like this:

#include <stdio.h> int while_example(bool& condition) { if (condition) { printf("It's true!"); while_example(condition); } else { return 0; } } int main() { bool x = true; while_example(x); return 0; }

This tutorial isn't geared at the language but moreover the experience and knowlegdge gained from learning x86 Assembly.

Writing our first x86 Assembly code

Anyway let's start (for real this time).

In the IDE website let's type some code.

As I said earlier, x86 Assembly code doesn't have variables.

I did note, however, that there are registers you can use.

These registers reside in the CPU and they are very fast.

The registers we will be using for now are as follows.

  • eax
  • ebx
  • ecx
  • edx

These registers are just general purposed places to store and manipulate data. They are usually temporary and if you wanted to store something for longer you could use the bss or data section which we will talk about later.

Anyway lets write some code!

mov eax, 8

Write that in the IDE webiste.

Let us discuss the synatx of that code.

The mov statement is used to say to the computer we want to move some data.

This instruction takes two operands, a location and data.

The data can be a location as well.

In this case we are moving the number 8 into the register eax.

So the syntax is:

mov location, data


To make a comment we use ;.

So we do it like so:

; this is a comment!

Basic Syscall 0x80

Don't be frightened by the title, this is very easy. At the start in may seem complicated but once you see the bigger picture this will be a peice of cake.

If we write this line of code

int 0x80

What happens?

Well, this line is known as an interrupt and it is how we perform system-centered tasks.

This interrupt takes a paramater stored in eax.

So, if we wrote this

mov eax, 7 int 0x80

Then that would work right?


The system interrupt usually needs data from other registers too.

The value stored in eax is telling this specific interrupt handeler what to do.

In the case we want the interrupt to end the program.

The value we need is 1.


mov eax, 1 int 0x80

Would kind of work.

But this syscall takes another argument.

This argument lives in the ebx register.

This argument is the exit code.

0 is the exit code for success.

So, you could think of this as a function definition and calling like this.

int syscall_exit(ebx) { doSomething(); } exit(0);

So this code:

mov eax, 1 mov ebx, 0 int 0x80

Would work.


So, until now your code hasn't worked correctly.

This is due to the fact that we don't have an entry point or an _start symbol.

The _start symbol is needed to start our code.

So, we write this sytax:


So for our purposes the label_name is _start.

So we write:

_start: <code>

How about we paste our old code in there?

_start: mov eax, 1 mov ebx, 0 int 0x80

It still won't work correctly!

We need to expose the label to the linker (a thing in the process of getting our executable.

This makes our label global so the linker can see it.

So we do this syntax.

global <symbol>

Which in our case symbol is _start.

And we do:

global _start _start: mov eax, 1 mov ebx, 0 int 0x80

The text and data section.

So far all of our code would be reffered to as text section code.

Let us outline the differences betweeen the text and data section:


  • text
    • Stores code
    • Labels
    • Syscalls
    • Move statements
  • data
    • stores data
    • defining bytes
    • memory
    • pointers

Ok, so to define a section we use the syntax:

section <name>

In our first case the name is .text.

Let us add this to our old code.

section .text global _start _start: mov eax, 1 mov ebx, 0 int 0x80

Everything below the section definition is now part of that section.

Ok, let us test out the .data section.

section .text global _start _start: mov eax, 1 mov ebx, 0 int 0x80 section .data ; now what?

Let us define a byte.

To define a byte we use the syntax <name> db <data>.

So how about we define a byte called x with a value of 99 and try to exit our program with x.

Ok, let us define our byte.

section .text global _start _start: mov eax, 1 mov ebx, 0 int 0x80 section .data x db 99

Ok, now we exit our program with x.

section .text global _start _start: mov eax, 1 mov ebx, x int 0x80 section .data x db 99

Ok, what is going on here?

Well like we said we defined x with x db 99.

We then did the standard exit program with instead of our 0 we put the value stored at the memory location x is reffering to with mov ebx, x.

And there!

Hello, World!

Ok, let us use our knowledge we have gained thus far and do a "Hello, World!" program!

So, firstly we need to talk about int 0x80 again.

Like I said, this is a syscall.

Meaning the operating system handles it.

So, we need to figure out which one allows us to write to the screen.

Firstly, let me teach you about stdout.

stdout is a buffer used by the system to handle output to the terminal.

So what we need to do is write to that buffer.

The syscall we just did with the code of 1 was the sys_exit syscall.

The syscall we need is sys_write.

The code for this syscall is 4 and it takes the arguments of all of the registers you have learned so far.

The functions of these registers in sys_write are as follows.

  • eax: the code (4)
  • ebx: file descriptor (in our case 1 for stdout)
  • ecx: data
  • edx: data size

Ok let us go over that a bit more.

The file descriptor or ebx is just a little bit of information about the file, we will use this when we do file I/O. In our case, the operating system knows that 1 is stdout.

The data is the ASCII output we want. This could be an ASCII string we define, a integer, or a hexadecimal value. In our case we will be using an ASCII string we define.

The data size is how big our data is in bytes. Since one ASCII character is 1 byte we can just count the amount of characters if we want to, however, we will be using a different way.

Ok, let's start by defining our string and movibg the first data.

section .text global _start _start: mov eax, 4 mov ebx, 1 mov ecx, message section .data message db "Hello, World!"

Ok, you might think we are done with that part, but you'd be wrong.

You see we also want a newline on the end.

The problem is assembly doesn't have a \n or a std::endl like some languages.

In assembly we have to reference the ASCII code for newline.

That would be 10, but we are going to reference it in in hexadecimal with 0x0a.

So we are just going to tack this on the end like this:

section .text global _start _start: mov eax, 4 mov ebx, 1 mov ecx, message section .data message db "Hello, World!", 0x0a

And there we go!

We defined our string, we moved it into ecx and we now have a newline!

Now let's calculate the length.

We are assembling with NASM behind the scenes and NASM has a macro for doing so.

This macro is:

$- ; after the - put the name of your data.

So in our case it is:


Now we need to make a data pointer equal to this.

So we use equ!

The syntax is:

<name> equ <data>

It's very similar to db in use.

So we can write:

section .text global _start _start: mov eax, 4 mov ebx, 1 mov ecx, message mov edx, message_len section .data message db "Hello, World!", 0x0a message_len equ $-message

Ok, now we defined our length and put the length inside of edx.

Now, let us print using int 0x80

section .text global _start _start: mov eax, 4 mov ebx, 1 mov ecx, message mov edx, message_len int 0x80 section .data message db "Hello, World!", 0x0a message_len equ $-message

And, we are done!

I'd reccomend just as good practice we exit our code with sys_exit.

If you want to do that just tack:

mov eax, 1 mov ebx, 0 int 0x80

On the end of _start.


Ok, you have started your journey of x86 Assembly!

There will be more parts coming soon.

If you liked this and want more people to see it share it with your friends and or upvote it.

Profile icon
Profile icon
Profile icon
Profile icon
Profile icon
Profile icon
Profile icon
Profile icon
Profile icon
Profile icon
Profile icon

javascript is better, we need to type that much code for hello world whereas in JS we can do this -

console.log('hello world')

or in python simply

print('hello world')

I'm not sure but in ruby it's like -

puts "hello world"

Hello world in 35 languages

Profile icon

can you manipulate memory and hardware in JS? Can you load a GDT in Python?

Profile icon

maybe your comment is a joke? if not, you missed the point. clearly, for most tasks, higher-level (3rd gen) languages are 'better' in that you write less code and the machine does more. assembly and variants are considered SECOND generation (2nd gen) languages--1st gen being machine code itself. assembly was created as 'shorthand' so that people didn't have to write 0s and 1s...just like C/C++/Java/JavaScript/Python/etc were created so that people didn't have to write in assembly. we all stand on the shoulders of giants, my friend. :)

Profile icon

Hippity hoppity this code is now my property cntrlc + cntrl v go BRRRRRRRRRRRRR

Profile icon

Dani right?

Profile icon

yup lol

Profile icon

Very informative! Upvoted :)

Profile icon

Yeah, surprisingly I knew a bit about X86 yet I know almost nothing about it!

Profile icon

Very well done! 👏👏👏

Profile icon

Nice :), very informative

Profile icon

Thanks! If you liked it please upvote so more people will see it. I really want to start an interest in x86 Assembly among replitors so repl might add assembly and so assembly will become more of an interest among this community :-)


Profile icon

Hello, great job on the tutorial take my upvote. Repl does have nasm installed in a lot of repls (only polygott (their main docker image) based ones)

Maybe ill learn and program with some x86 assembly


Keep up the good work

Profile icon

The problem with low level machine languages is that it really (and I emphasise on the really) depends on the machine and the assembler and disassembler... right?
I mean, there's NASM, GAS, HLA and much more. The problem with writing in assembly is oftentimes the instructions are different too.

X86 is one of the more common assembly languages, but there are still so many out there (as well as different assemblers/disassemblers out there).

Also, you might want to teach them about assemblers/disassemblers too, because that's an important part of assembly code.

Profile icon


The problem with low level machine languages is that it really depends on the machine and the assembler and disassembler... right?

Assemblers map instructions to binary, there is not loss or change when round-tripping.

How is this a problem though?

By being as close as possible to the hardware, you have complete control over the code, you can setup a perfect code execution scedule, you can prevent false register dependencies that compilers can't even detect, and, by having exactly what instructions the target CPU has, you can make use of them! If that CPU has SIMD operations and the hardware has the registers, perfect, make use of them! If that CPU has an popcnt instruction, and that's the operation that you're looking for, look no further, the hardware implementation will execute everything in parallel and in constant-time.

How is having different instructions problematic?

Profile icon

Cross-compatibility with other systems. Although, yes controlling something like CPU scheduling seems really nice and convenient.

Profile icon

Cross compat is a high-level thing, the lower you get, the less that matters.

How much stuff have humans developed specificially for use on Earth? Shoot, I guess most of it won't be compataible with Mars' environment & atmosphere when we make it there. Lmao.

Profile icon

True. I still think Earth will be used a lot!

Maybe other planets will be used for the acual mining and stuff, but eh.

True, I don't see that well compatibility with C/C++ (and they are considered "high-level", at least C++ is as I have heard, but they are getting moved to "lower-level" languages).