How To Make A Language: Parsing
The last tutorial didn't really give an in depth explanation as to what everything was doing aside from some comments, however for parsing I feel I should explain how it works.
Markdown was really weird on this post.
We are going to be using a recursive descent parser, written all by hand and with no dependencies.
Lets start with an expression
8 + 5 * 4 / 2
The first thing we are going to do is tokenize it, so this expression will tokenize to this
1: ("Number", 8) 2: ("Operator", "+") 3: ("Number", 5) 4: ("Operator", "*") 5: ("Number", 4) 6: ("Operator", "/") 7: ("Number", 2)
Im going to do a little experiment here, and if it works you should be able to read the above stream of tokens just like an rdp parser. When you see
Add(), Sub(), Mul(), Div() and Literal() in the text below jump to the corresponding label!
Start with Add.
Left = Sub()
Is the next token a + operator? If so then: Right = Sub() Return (Left, "+", Right) Otherwise return the left side
Left = Mul()
Is the next token a - operator? If so then: Right = Mul() Return (Left, "-", Right) Otherwise return the left side
Left = Div()
Is the next token a * operator? If so then: Right = Div() Return (Left, "*", Right) Otherwise return the left side
Left = Literal()
Is the next token a / operator? If so then: Right = Literal() Return (Left, "/", Right) Otherwise return the left side
Return the current token and advance to the next one
Now, if you went through everything above and kept track of where you were in the sequence you will notice that you end with a tree that looks roughly like this:
Well let's just stop here then and all just take a break. :)
Or is it this one?
If we learn of any user that is under the age of 13, we will immediately terminate the user’s Account and delete any and all information we may have about that user.
Cool! I wanted to learn how to make a parser, now I know! But what is the node.type thing?
just make a bunch of functions and classes /s
All this really does is parse an expression and then call Python's built-in operator, which probably calls C's built-in operator, which already sounds inefficient. Now if this were compiled into some binary that performed this, and maybe held some variables...
Isn't recursion like, the worst way to do anything? Aren't bottom-up non-recursive parsers usually more efficient and fast?