Share your repls and programming experiences

← Back to all posts
New Bite
h
MocaCDeveloper

Hello, again

Lol. Ok. This time I am actually sticking with the language. This language is Bite, yes I know I said I was making Bite like 2 months ago but long story short I wanted to make a compiled language, so here I am :)

What is Bite going to be

Bite is going to be a low-level statically typed language compiled down to 32 bit assembly for speed.

Note: I did very minimum research and from what I read compiling down to 32 bit assembly would result in running faster programs

So far, I have:

  • The main function.
  • Return statement for the main function(if anything but zero is returned, it results in an error)
  • Variable declarations(only int)
  • Printing statement

My next objectives is to tackle down the main function arguments(if you write C it is commonly known as argc and argv.)
Then, enabling the language to hold strings, characters, f64 and f32 types as well as different integer sizes.

So yeah. I feel very accomplished with what I have gotten done so far! :D

Voters
mkhoi
arnavratna
programmeruser
realTronsi
elipie
DynamicSquid
fuzzyastrocat
ethanshaozz1928
ANDREWVOSS
MocaCDeveloper
Comments
hotnewtop
realTronsi

You had too choices:

func and fn, and yet you choose fun.

anyways what did you use t compile?

realTronsi

@realTronsi also slightly bothered why "Print" keyword is capitalized lol

fuzzyastrocat

@realTronsi If I understand correctly, Bite is like Sea in that it's hand-compiled.

realTronsi
fuzzyastrocat
realTronsi

@fuzzyastrocat oh nice, you used javascript?!

realTronsi

@fuzzyastrocat I'm just curious how hand compiling works, do you somehow translate tokens into assembly or how does that work?

fuzzyastrocat

@realTronsi Since it was my first compiled language I used JS so that I wouldn't have to worry about details (and since it's compiled the JS doesn't matter performance-wise).

fuzzyastrocat

@realTronsi Nonononono, it's just like a normal lang where you lex, parse, generate AST. Then, instead of executing the AST, you generate assembly code for each node. So if I had:

I would recursively call a function which:

  • Checks what type of node it is. It's a plus node, so it:
    • Calls itself on the first child node:
      • Checks with type of node it is. That's a number constant node, so it outputs something like mov $1, %rax and returns.
    • Calls itself on the second child node:
      • Checks with type of node it is. That's a number constant node, so it outputs something like mov $2, %rbx and returns.
    • Then outputs add %rax, %rbx and returns.

So you end up with the needed assembly for that AST:

(Obviously the assembly there is just for example, it would probably be different in reality.)

realTronsi

@fuzzyastrocat hmm so you would need to know full assembly? what about features assembler doesn't support or something?

fuzzyastrocat

@realTronsi
1. Yeah, you need to know assembly to hand-compile.
2. There is no feature assembly doesn't support. There's always a way to do it in assembly. Some things (for instance, trying to do 1024bit arithmetic) will be harder than others to implement, but it's always possible. (All compiled languages are translated to assembly at some point — yes, even languages with LLVM, since its IR will get reduced to assembly — so there is a way to do it.)

realTronsi

@fuzzyastrocat I don't know assembly, but for example passing in parameters for functions

fuzzyastrocat

@realTronsi That's very very possible, using the stack. It's how all compiled languages do it, and it's how you do it in assembly if you're just coding in assembly alone:

Suppose we're calling add(1,2).

  1. Push each parameter onto the stack — push $1, push $2.
  2. Call the function: call add (which basically just pushes %rip and then jumps to the add label)

continued at the function itself:

  1. Push the current stack base — push %rbp
  2. Set up a new stack frame with a new base — mov %rsp, %rbp
  3. Do the function stuff, leave return value in register %rax.
  4. Pop any values that were pushed during the function's execution — mov %ebp, %esp.
  5. Reset old stack frame — pop %ebp
  6. Return to the old place: ret (which basically just pops into %rip)

control now resumes back at the call, and the return value of the function is in %rax

Lengthy yes, but that's the standard procedure.

realTronsi

@fuzzyastrocat wow, what about stuff like dynamic variables, are theres stuff like void pointers in assembly or something (I'm completely clueless)

fuzzyastrocat

@realTronsi What do you mean by "dynamic variables"?

If you mean variables with dynamic type, then that's an interesting question. See, in assembly, there is really only one type: word. (There's different sized words, but that's irrelevant.) A word is basically just a number stored in memory. That's the only type — it's the only thing you can push to the stack, only thing you can hold in registers, etc.

So how do you represent strings? Well, you start by giving each char a number (namely, the ascii value) and now a word can hold a char. That lets you do c-style strings by either:

  • Pushing each char to the stack. This works for fixed size strings (and in theory works for variable length strings) but it gets pretty cumbersome.
  • Using heap allocation, like malloc. This is great for variable length strings, but it's a bit inefficient.

Now, those two methods are exactly how C does it. When you give C a fixed size string, it puts it on the stack, char-by-char. The only way to make variable length strings is with malloc, and at that point you're doing everything yourself.

So that's strings. Arrays are similar — if you're defining a fixed-size array it would probably go on the stack, and if you want a variable length array you'd put it on the heap.

Now what about structs? Well, structs are actually just arrays, except each element in the struct has been given a name by the compiler for you. So structs don't actually add anything new, it's just a nicer way to record an array. Since we're dealing with assembly, we don't have nicer anything, so we can ignore structs since they can be emulated with arrays.

Once we have "structs", we can do pretty much anything. You can make a linked list, or an "object" (hashmap), or a binary tree, or whatever data structure you like.

So, that's how you deal with data structures in assembly. As for void pointers, you can now see why the idea of a "void pointer" isn't really applicable to assembly. Since assembly only really has one type, a void pointer (a pointer to any type) is just the same as a pointer since "any type" just means "the one type assembly has".

If you look at it from another angle, every pointer in assembly is a void pointer, since data structures don't exist and therefore it doesn't know if you're pointing to a word, a string you made, a "struct", the head of a linked list you made, or whatever. That's up to you, the programmer, to remember.

realTronsi

@fuzzyastrocat hmm so whats up with having to defined the type in lower level languages. For example in C you need to do

and that variable will always be an int, whereas in some higher level languages you can do stuff like

If everything is just a number, why can't low level languages change the variable type

fuzzyastrocat

@realTronsi Well, because everything isn't a number (I said the only type in assembly is a word, or number, but not every type in languages like C are a number). In your second example, foo has the size "one word" when you assign it to 5 (the number 5 can be represented as a single word). But when you assign it to the string "blah", foo now has the size "5 words" (1 word for each character, and one for the null-terminator). That means that the variable now has to store how large it is (type information) along with its actual value.

So, statically-typed low-level languages could have dynamic typing, but that would mean inefficiency (both memory-wise and performance wise).

realTronsi

@fuzzyastrocat hmm okay, this makes sense

xxpertHacker

@realTronsi Very late, but not necro-posting yet,

and that variable will always be an int, whereas in some higher level languages you can do stuff like

var foo = 5
foo = "blah"

In truth, that literally happens. After analyzing C++ compiler output plenty of times, I noticed some things that they do, they can't be done in the source language.

Take, for example, a function that accepts a reference to an int.
Assuming that an int has the size of a word on that platform, meaning that the size of a reference is a word, and dereferencing the reference results in one word, the compiler can optimize it by dereferencing itself, into itself.

The compiler would actually do something analogous to:

Also, in dynamically-typed languages like JavaScript, each variable actually is the size of an object reference.

So whenever you say x = 9, you're not storing a small 32-bit integer with the value of 9, you're storing the value 9 into a massive, object-sized space.

So all operations are slower and consume more memory than they need to in those languages.

elipie

Hello, I am interested in making languages in C++, and tbh you look like you have been making languages since you were born! I need a LOT of help, and I have A LOT of questions, and online tutorials just dont do it for me. You do not have help me out, but I would appreciate it :D

MocaCDeveloper

@elipie

Sure! Ask away I am always willing to help!

elipie

@MocaCDeveloper can we get on a repl, because I would like to code while getting answers xD

MocaCDeveloper

@elipie

ok sure!
sorry I was a bit busy today! but i am going to be available tomorrow!

MocaCDeveloper

@elipie

hey. just let me know when you're available!

I should be available all day

elipie

@MocaCDeveloper ok, im available right now, I have been doing stuff all weekend

MocaCDeveloper

@elipie

Ok I will be on in a minute maybe. I am probably Gonna be busy for a bit but I will try and get on!

MocaCDeveloper

@elipie

Just let me know when you're available.

elipie

@MocaCDeveloper ok rn would be ok.

elipie

@MocaCDeveloper ill invite u to the repl

MocaCDeveloper

@elipie

Ok! Heads up, if it is in C++ I will struggle a bit but I will get the hang of it.
If it is in C then we'll be able to get right into it!

elipie

@MocaCDeveloper back, this is kinda lame, talking on repl talk my discord is elipie#6261

MocaCDeveloper

@elipie

I don't have discord on my school computer :/

elipie

@MocaCDeveloper F get on the repl?

MocaCDeveloper
elipie

@MocaCDeveloper me ready. i invited u to the new one

MocaCDeveloper

@elipie i'm on!

elipie
DynamicSquid

Nice! Was learning assembly hard?

MocaCDeveloper

@DynamicSquid

Learning assembly is one of those things where you learn as you go. You read a tutorial on how to print using assembly, then do a crap ton of research on what each thing does(like for example I looked up what mov is and how it works, what registers are what and how they're used etc).

I would say that you really have to understand every aspect of assembly and its registers in order to have a good understanding on how it works

MocaCDeveloper

@DynamicSquid

Also, I highly suggest to do allot of research lol. Don't be like me and read one documentation xD

fuzzyastrocat

Nice! Just a note: compiling to 32 bit assembly might be faster, but it'll be much harder.

DynamicSquid

@fuzzyastrocat Whats the difference between 32 bit assemvly and AT&T assembly? Sorry if that's a dumb question :/

fuzzyastrocat

@DynamicSquid There is no dumb question if you're sincere about it :D

That's fruits and vegetables again. Those are two different categories:

  • You have AT&T assembly vs Intel (ASM) assembly
  • You have 32 bit assembly vs 64 bit assembly

Neither of those categories require a specific choice from the other.

DynamicSquid

@fuzzyastrocat Oh, okay. So what's the difference bettween 32 and 64 bit? Is it just for different computers?

fuzzyastrocat

@DynamicSquid 64 bit has native support for 64 bit precision integers. 32 bit does not.

That's the most common difference, however there are some more detailed things that can be difficult in 32 bit assembly. For one, support for it is being phased out, so it can be hard to find tools/resources for it. For another, generating Position Independent Executables (I won't go over what they are here, but it's basically code that can run no matter where it gets loaded into memory because it never accesses fixed addresses, only relative offsets from the instruction pointer) is way more difficult in 32bit than in 64bit.

There's more fine points like this that don't seem immediately apparent but may come back to bite you (no pun intended) if you aren't careful.

MocaCDeveloper

@DynamicSquid

It is x86 assembly compiled down to 32 bit assembly.
And I seriously don't know what syntax I am writing but I believe it is AT&t.

I am just stating this from my minimal knowledge on what I know. As I said, I did minimal research

MocaCDeveloper

@DynamicSquid

The difference is A. the syntax of the assembly and B. the speed, as well as the OS you're running. x86 and x64 were originally made for different computers systems which is why making a compiler is a bit hard, and I believe is why compiling down to 32 bit assembly was the best choice because then any system can compile it.

Do not take my word as the truth, I did minimal research and I am just taking a wild guess that x86 and x64 assembly are for different systems
^ It would make sense tbh, but if I am wrong correct me!

fuzzyastrocat

@MocaCDeveloper Yes, you are using AT&T. EDIT: Oops looked at wrong code, you're using Intel ASM.

What do you mean by "x86 assembly compiled down to 32 bit assembly"?

MocaCDeveloper

@fuzzyastrocat

Yeah. You know way way more than me. I literally just looked up how to compile x86 down to 32 bit and I was rolling ahead lol

MocaCDeveloper

@fuzzyastrocat

I compile it down to 32 bit binary, I believe. I don't believe I write 32 bit assembly. The tutorial I saw had only compiling from x86 to 32 bit or x64 to 32 bit and I am using nasm so I believe I am compiling from x86 to 32 bit..Idk hang on let me see if I can find the documentation I was reading lol.

I should of probably put more research into it but I was just looking for a way to compile a language fastly lol

MocaCDeveloper

@fuzzyastrocat

Ok, lol, I am dumb

I AM writing 32 bit assembly but I have to specifically compile it TO 32 bit because by default it runs as x86 assembly(with nasm and the gcc compiler)

fuzzyastrocat

@MocaCDeveloper A few things:

x64 is the same as x86-64. There is no syntactical difference between them. x64 is simply a 64-bit extension of the x86 instruction set.

You cannot compile assembly. Assembly gets assembled so the assembler produces the binary.

fuzzyastrocat

@MocaCDeveloper

but I have to specifically compile it TO 32 bit

This is not true. Because of how similar x86 and x64 are, I guarantee you could probably assemble a 64-bit binary with minimal (or even no) code change.

fuzzyastrocat

@MocaCDeveloper Oh wait, I was looking at the wrong code. No, you're not using AT&T, you're using Intel ASM.

MocaCDeveloper

@fuzzyastrocat

Yep I was just about to correct myself on that!

Also, I really don't know anything about assembly except for the fact I am writing the compiler in 32 bit assembly and using nasm to compile the 32 bit assembly.

I should've probably done more research, but I just found a documentation that walked through 32 bit assembly and how to assemble it with the nasm compiler and that was all I needed.

MocaCDeveloper

@fuzzyastrocat

I knew that x86-64 were a bit similar, but I did not know that I could compile my assembly as x64 and almost everything would be the same.

I might do that, but I don't know.
What is the benefits of writing x64 instead of 32 bit since 32 bit is probably a bit faster?

fuzzyastrocat

@MocaCDeveloper As for why you'd want to assemble it as 64 bit, read my comment above about some of the advantages of 64bit.

If you're wanting to compile straight down to assembly, you really need a firm grasp of how it works and how everything related to it works. I'd really encourage reading up on it. A great resource for this (it's not 64 bit, but all the code will work as 64 bit if you change the 32-bit specific registers like %e__ to %r__) is https://norasandler.com/2017/11/29/Write-a-Compiler.html, a link which I believe I've given you before (but I might be wrong).