x86 Assembly Tutorial Part 2 (again, its big i promise..)
Wuru (44)

Assembly Tutorial Part 2

Ello.

Let's just hurry into part 2 :-).

The stack

Let me introduce a concept to you.

There is a thing on your computer called the stack.

It is more of a concept than a thing.

It resides in your RAM and it is what is known as a Last in, First out or LIFO data structure.

Let us discuss what that is for a second.

LIFO data structures

Imagine you have a stack of books.

This stack of books has a top and some stuff under it.

Let us say that there is 5 books currently on this stack.

Let us order them like this:

  • Book #5
  • Book #4
  • Book #3
  • Book #2
  • Book #1

And you have another Book #6 off to the side.

If you wanted to take off Book #4 you would first have to take off Book #5.

That is known as popping from the stack.

If you were to put Book #6 on top that would be pushing to the stack.

Let us look at a more relavant example with data.

Data LIFO

Let us say you have 1 register and how about it is eax.

And you have a stack.

The stack has a pointer register called esp.

This pointer register points to the top of the stack.

Let us have a visual of all of this data with a few pushes and pops.

Before anything:

esp: 16

eax: 0

stack:
0  -  esp *points* to this meaning that it's value is this memory location
0
0
0

Notice that esp starts at 16 this is because on x86 processors the stack starts at the top.

Also we only have 4 places in the stack (a real stack would be much bigger than this) but esp is at 16.

This is because each place in the stack is 4 bytes and each memory location is 1 byte in itself so we need to take the amount of things in the stack and multiply it by 4 to get esp.

Ok, so now lets push 9 to the stack.

esp: 12

eax: 0

stack:
9
0  -  esp *points* to this meaning that it's value is this memory location
0
0

Now let's pop 9 back out into eax.

esp: 16

eax: 9

stack:
9  -  esp *points* to this meaning that it's value is this memory location
0
0
0

For you C programmers out there I will also provide a structure-form of a stack.

typedef struct {
  int esp = 32;
  int size = 8;
  int[] data = {0, 0, 0, 0, 0, 0, 0, 0};
}stack;

void push(stack* s, int data) {
  s.data[s.esp / 4] = data;
  s.esp = s.esp - 4;
}

void pop(stack* s, int* location) {
  location = s.data[s.esp / 4]
  s.esp = s.esp + 4;
}

int main() {
  int eax = 0;
  stack myStack;
  push(myStack&, 9);
  pop(myStack&, eax&);
  return 0;
}

The stack is actually placed upside-down in RAM aswell.

The stack (continued)

OK, so how do we do pushes and pops?

Well let us review each seperately.

Push

Syntax:

push <value>

value can be a register, pointer, straight-out data (such as just the number 5), and a few other things that we will discuss later.

Pop

Syntax:

pop <location>

location can be a register, pointer, and a few other things that we will discuss later.

Ok! Now you have a decent understanding of the stack! You can use it to store things, preserve things, and to pass arguments to labels.

Basic Math!

Add

add takes 2 arguments the first one being a location and the second being a location or straight-out data.

So like this:

add eax, 1

It adds them and saves the value into the first argument which is the location.

Sub

sub also takes 2 arguments the first one being a location and the second being a location or straight-out data.

So like this:

sub eax, 1

It subtracts the second from the first and saves the value into the first argument which is the location.

You could think of both add and sub like so:

void add(int* location, int data) {
  *location = location + data;
}

void sub(int* location, int data) {
  *location = location - data;
}

int main() {
  int x = 9;
  add(&x, 10); // x should now be 19
  sub(&x, 10); // x should again be 9
  return 0;
}

Dec and Inc

Both dec and inc take 1 argument which must be a location.

dec decrements 1 from the location and inc increments the location by 1.

You could think of them like...

void inc(int* location) {
  *location = location + 1;
}

void dec(int* location) {
  *location = location - 1;
}

int main() {
  int x = 9;
  inc(&x); // x should now be 10
  dec(&x); // x should now be 9 again
  return 0;
}

jmp and call instruction.

If we define a label alongside our _start label like so...

section .text

global _start

_start:
  ; code...

  ; exit our program
  mov eax, 1
  mov ebx, 0
  int 0x80 ; syscall

add20:
  ; this adds 20 to eax
  add eax, 20

How do we call it?

Well, we use the jmp instruction!

So, let's do that.

jmp syntax:

jmp <label>

So let's update our code.

section .text

global _start

_start:
  ; code...

  jmp add20

  ; exit our program
  mov eax, 1
  mov ebx, 0
  int 0x80 ; syscall

add20:
  ; this adds 20 to eax
  add eax, 20

The problem is the code below doesn't get executed.

There is however a way around this, the finiky but useful call instruction!

The call instruction has this syntax:

call <label>

Let us update our code with the call instruction.

section .text

global _start

_start:
  ; code...

  call add20

  ; exit our program
  mov eax, 1
  mov ebx, 0
  int 0x80 ; syscall

add20:
  ; this adds 20 to eax
  add eax, 20

Still doesn't work...

That is because we never returned from our label.

We do this using the ret instruction!

So, this instruction has no syntax other than the keyword ret.

So let us add that to the end of add20

section .text

global _start

_start:
  ; code...

  call add20

  ; exit our program
  mov eax, 1
  mov ebx, 0
  int 0x80 ; syscall

add20:
  ; this adds 20 to eax
  add eax, 20
  ret

Let's see if our code is working by adding a bit of extra code to put 20 in eax before calling add20 and exiting with the result (which should be 40).

section .text

global _start

_start:
  ; code...

  mov eax, 20

  call add20

  ; exit our program
  mov ebx, eax
  mov eax, 1
  int 0x80 ; syscall

add20:
  ; this adds 20 to eax
  add eax, 20
  ret

There ya go! We did it!

Conditonal jumps.

Ok. So what if we wanted to jump with a condition.

Let's make a goal for this section. Let's write a program that exits with 0 if eax is greater than 10, otherwise, it exits with 1.

Ok, firstly we need to know how to compare the two numbers.

For this we use the cmp instruction!

The syntax for `cmp is as follows:

cmp <x>, <y>

Please note that both x and y can be a location or data.

So we do that, now what?

Well cmp sets a few flags inside of the CPU.

If we want to jump based on these flags we use a few special jump statements.

These are exactly like the regular jmp except for the fact that they only jump based on these flags, so the syntax is exactly the same it's just the keyword is different.

The most basic are as follows

  • jg: jump if x was greater than y
  • jl: jump if x was less than y
  • jge: jump if x is greater than or equal to y
  • jle: jump if x is less than or equal to y
  • je: jump if x is equal to y
  • jz: jump if x is 0
  • jnz: jump if x doesn't equal 0
  • jne: jump if x does not equal y.

Ok, what we want for our goal is jg.

So let's write our code.

First we define the stuff that we will need.

section .text

global _start

_start:
  mov eax, 11 ; so it will be greater than 10 it should exit with 0

  ; now what?
  
exit0:
  mov eax, 1
  mov ebx, 0
  int 0x80

exit1:
  mov eax, 1
  mov ebx, 1
  int 0x80

Ok, well we want to use jg so we will first compare with:

cmp eax, 10

Then we will jump if greater with jg:

jg exit0

So let's tack:

cmp eax, 10
jg exit0

On the end of _start.

section .text

global _start

_start:
  mov eax, 11 ; so it will be greater than 10 it should exit with 0

  cmp eax, 10
  jg exit0
  
exit0:
  mov eax, 1
  mov ebx, 0
  int 0x80

exit1:
  mov eax, 1
  mov ebx, 1
  int 0x80

Ok, if it doesn't jump it will just get lost so we need to jump to exit1 if it's not.

Remember, because it's jmp it doesn't return so we will just tack:

jmp exit1

On the end of _start to catch it if doesn't jump to exit1.

section .text

global _start

_start:
  mov eax, 11 ; so it will be greater than 10 it should exit with 0

  cmp eax, 10
  jg exit0

  ; if it doesnt jump up there it needs to go to exit1

  jmp exit1
  
exit0:
  mov eax, 1
  mov ebx, 0
  int 0x80

exit1:
  mov eax, 1
  mov ebx, 1
  int 0x80

And there we go!

Function Prolouge and Epilouge

So as a safeguard programmers invented the Function Prolouge and Epilouge to go at the start and end of labels using call to make them safer.

FYI, C uses these.

For example:

int main() {
    return 0;
}

Would compile to:

main:
  push ebp
  mov ebp, esp
  mov eax, 0
  leave
  ret

Or some equivalent.

So what are these lines.

Well, for now you really don't need to know what they mean. Just know that ebp is the base pointer.

Prolouge

Put this at the start of your callable labels.

push ebp
mov ebp, esp

Epilouge

Put this at the end of your callable labels.

leave
ret

What leave actually is doing is undoing the code above like so:

mov esp, ebp
pop ebp

This code is to preserve the stack pointer so it doesn't get all messed up.

Conclusion.

Let's first write a program with all we have learned so far.

The goal of this program is to increment eax until it is greater than or equal to 100.

It will also put the amount of times it has reacurred into ebx.

Let's write this.

Firstly we need to define _start and our adduntil label.

section .text

global _start

_start:
  mov eax, 50 ; it should go over 50 times

  call loop ; calling the loop

  ; exit with the amount of times it has reacurred
  mov eax, 1
  int 0x80

adduntil:
  push ebp
  mov ebp, esp

  ; now what?

  leave
  ret

Now we can define another label called loop which will loop over.

section .text

global _start

_start:
  mov eax, 50 ; it should go over 50 times

  call loop ; calling the loop

  ; exit with the amount of times it has reacurred
  mov eax, 1
  int 0x80

adduntil:
  push ebp
  mov ebp, esp

  ; now what?

  leave
  ret

loop:
  inc eax
  inc ebx
  ; ???

Now on the end of loop we put the code to check stuff.

section .text

global _start

_start:
  mov eax, 50 ; it should go over 50 times

  call loop ; calling the loop

  ; exit with the amount of times it has reacurred
  mov eax, 1
  int 0x80

adduntil:
  push ebp
  mov ebp, esp

  ; now what?

  leave
  ret

loop:
  inc eax
  inc ebx
  cmp eax, 100
  jl loop
  ; ???

Ok, now we need to cut of the end of _start so we can jump to that.

section .text

global _start

_start:
  mov eax, 50 ; it should go over 50 times

  call loop ; calling the loop

adduntil:
  push ebp
  mov ebp, esp

  ; now what?

  leave
  ret

loop:
  inc eax
  inc ebx
  cmp eax, 100
  jl loop
  jmp exit

end:
  ; exit with the amount of times it has reacurred
  mov eax, 1
  int 0x80

And tidy up...

section .text

global _start

_start:
  mov eax, 50 ; it should go over 50 times

  call adduntil ; calling the func

adduntil:
  push ebp
  mov ebp, esp

  cmp eax, 100
  jl loop

  leave
  ret

loop:
  inc eax
  inc ebx
  cmp eax, 100
  jl loop
  jmp exit

end:
  ; exit with the amount of times it has reacurred
  mov eax, 1
  int 0x80

And we are done!

In this case, the function prolouge and epilouge for adduntil aren't really required but why not (maybe it causes some problem, if you have an error remove it, I just wanted to show it off.)?

Please upvote if you liked it, it helps more people see the tutorial :-).

More parts coming soon!

You are viewing a single comment. View All
programmeruser (572)

@Wuru you forgot to discuss an extremely important subject: memory
(also I saw that you changed your username)