Making a compiled programming language
Hi. So I am currently starting a new programming language called Bite and this language is aimed to further advanced features to work with memory as well as advance features for low level development that not only make it easier to do but more readable/flexible.
Bite is interpreted as of right now, but that is actually a really BIG issue because Bite is going to be dependable on being fast.
And this is where the issue emerges. A compiled language tends to run faster than interpreted languages and this is why I NEED to make Bite compiled.
C'mon now, lets be serious: Would it make sense to make a low-level language aimed to make low-level development easier if it is interpreted? NO!
So, here's the question
Does anyone know any documentation(or articles) that will be helpful as to going step by step(or just explaining) how to create a compiled programming language.
I have researched it multiple times before but maybe I am not digging deep enough and I often run into time issues/time stumps.
So, will anyone be willing to help me?? Please and thank you, it will be much appreciated.
I know that a compiled language gathers all of the source code, tokenizes it, and builds a Syntax Tree off of the tokens, I just always find myself needing a runtime and this is where I get a bit confused as to where the compiled side of things comes in at.
Any help would be great!
The compilation is just the generation of executable machine instructions. Whether it be JiT, or AoT, executable code is generated nonetheless.
Some would consider outputting Asm to be compiled too, as there is a one-to-one correlation between a binary format and it's Asm textual representation. Although, Asm isn't executable.
You could learn binary / Asm and directly output bytecode instructions into a file, but that would be foolish.
I recommend looking into compilation libraries like LLVM, Qbe, CraneLift, etc.
(for the time being, I personally am going to output direct, unoptimized binary for my language from the recent PL jam, but I'll probably switch to a library, or make my own optimization system and program synthesizer :) )
that would be foolish.
I would have to disagree... I learned so much from making a compiler to AT&T assembly. I think it's very a viable option for a little personal lang.
However you might run into trouble making it cross-platform. If you get the lang to the point of being something where others will use it (which probably won't happen for quite some time), then yes it might make more sense to switch to something like LLVM.
@fuzzyastrocat For a long term, it is definitely foolish. There is no way you can complete the libraries out there. Now there is definitely a lot to learn from Asm languages, sure. But eventually, your job will be much harder trying to generate Asm / binary for everything, catching edge cases, optimization, etc.
@fuzzyastrocat I'm following my own advice here; since I already know a low-level language, for the time being, I don't need something as powerful as LLVM, yet.
But if you don't know an Asm language already, it might just be better to start with LLVM or a higher-level IR format instead of Asm / binary bytecode operations. But I'm not forcing either way upon others.
I know that the accepted answer is
https://craftinginterpreters.com, but I actually find Nora Sandler's guide https://norasandler.com/2017/11/29/Write-a-Compiler.html very helpful. It sadly isn't complete, but it does a great job of keeping things simple and balancing between showing you stuff and having you do stuff. And, even though Sandler's guide is incomplete, by the end of it I had the knowledge I needed to continue building my compiler. I'm not trying to say that craftinginterpreters is bad, I just think that it might be a lot to take in for a first time building a compiled language.
However, the opposite could be true — Sandler's guide compiles directly to machine code, a "true" compiler, while crafting interpreters compiles to a virtual machine. The virtual machine approach will naturally be slower, but might be easier to translate to — finding documentation for x86 can be tricky sometimes, but if you make the VM you'll know how it works.
In the end, it depends on what you want. Since your language seems to be so low-level, I would highly suggest compiling to machine code. But, either option will work.
@MocaCDeveloper Yes. In fact, I would not consider
craftinginterpreters.com a compiled language. It converts the user code to an interpreted virtual machine. I'm not saying it's a bad site, it's a really great tutorial, but it's just not a compiled language per-se (after all, it's called "crafting interpreters"). If you truly want a compiled language, you'll want to generate machine code, which
craftinginterpreters does not teach you how to do.
you should try making an interpreted language first, or transpile the language into like C++ and let C++ compile it for you
@Coder100 I will get to it when I am ready for the frustrations of facing the hardship of figuring out what the heck to do for each step!
But, if I put my head into it, I could probably write up a very well functioning lexer AND parser in about 2-3 hours!
Also, we need to start working on that Election Poll app soon, and another question, are you at all familiar with programming language development and the C language?
And isn't llvm a C++ library(or something of the sort)?
I don't know I am really confused :(
I will try to help un-confuse you :D
llvm is a C/C++ library. The simplest way to explain it is that you tell it how to compile your code based on your AST. Basically, for each node of your tree (your AST), you'll tell LLVM what "machine code" (llvm virtual machine code) to create. Then, LLVM handles the generation of the real machine code for you.
While this sounds easy, LLVM is a huge library and it can be hard to get started. I'd suggest only using LLVM once you've had experience with hand-compiling.
@MocaCDeveloper LLVM is simply a virtual machine and library that takes input as it's IR format and outputs native machine instructions for the device that it was run on. It is very useful for portability and optimization.
Now, as @fuzzyastrocat mentioned, LLVM is large, thus not easy to learn quickly. I would recommend at least dedicating a few weeks towards just learning LLVM, that is, if you're going to use it at all.
Once you've become familiar with it, it should be easy.