Instruction set simulators, and how to make one!
What is an instruction set simulator?
An instruction set simulator (ISS) is a simulation model of a low-level processor. Microprocessors execute machine code instructions, so an ISS could also be thought of as an interpreter for machine code programming. (wikipedia)
That sounds hard!
So, what is an instruction set?
Briefly, it's a list of instructions that a microprocessor can follow.
To be more in depth, we have to understand what the microprocessor does. The microprocessor or CPU of your computer gets data from memory (or input), "processes it", and then writes it back to memory (or output). A CPU or microprocessor also usually has limited internal memory too, called registers, which hold single values for the processor to work on. The instruction set defines these "processing" instructions such as move a value from RAM to a register, or jump to this instruction, and maybe also add this value from RAM to this register. Many instructions have operands (which are the addresses in RAM or registers that the processor operates on), but there's also instructions like NOP, or "no operation", which just stops execution. (wikipedia)
Why make an instruction set simulator?
If you're not already convinced, consider the following.
1. In the real world, an ISS is often used for debugging, because it allows you to better see what's going on and step through exactly what the computer is doing.
2. If you are designing a computer (any redstone engineers?) or more specifically, a processor, it allows you to try creating programs and test different configurations that help you achieve your goals.
3. ISS's are also great for anyone who wants to learn how computers work on the processor level. They really help you get a feel for what's really going on in your computer!
4. They allow you to create your own (very low-level) programming language! Especially useful for making low-level style esoteric languages like BrainF---.
And lastly, it's fun! So without further ado, let's get into it!
Step 1, building the computer
A microprocessor needs memory to work with, so let's make some! This is also a good time to consider the word size of your virtual computer, which is how big the memory values are (in bits). It's often easiest to just use whatever word size your programming language use, which just means using "regular" integers. If you want a specific size, for example a 16-bit computer, which uses number up to 2^16 (though you can simulate bigger numbers), then you will have to find how to get the right integer type in your language or simulate a smaller word size memory. To simulate N-bits memory, you need to modulo (%) 2^N after every operation.
Next consider how instructions are stored. Should they be in the same RAM where you store other values? What is your instruction's bit length?
The computer I'm building (or really just simulating) is the Minecraft redstone computer described in this tutorial. It's an 8-bit computer and it stores it's instructions and data in the same memory, so we'll declare one 8-bit array called
In C / C++ you would probably want (I think):
If you want a fixed-length integer (as I think it's called) in Python, you might need to do some special stuff.
I'll explain later why I made the array's length 16. Also, I'm using unsigned integers (
Uint), but it doesn't matter too much for an ISS.
The aforementioned redstone computer, called MASIC also has 2 general-purpose registers, which will both be 8-bits too. To keep them at 8-bits and also to help group them together, we'll define the registers as another
Uint8Array with a length of 2.
Register 1 is
REGISTER and register 2 is
REGISTER. We're cooking now! Now we just need a few important special-purpose registers.
1. A program counter, or PC, which will hold the address of the instruction we're on. When we execute an instruction, we increment the PC and we'll fetch the next instruction after.
2. A current instruction register, or CIR. The CIR stores the instruction we load from program memory. This is less important in an ISS than in a real microprocessor, but I'm still using it to make things easier.
We'll initialize both to 0 (in binary because it's cool).
They have the same value, but notice how I wrote 4 bits for the PC and the CIR has 8. There are only 16 addresses for instructions (and data) so the PC only needs 4 bits. This will also be explained more later. The instructions we will use occupy the full 8 bits, so the CIR should have 8 bits. I didn't put these in a
Uint8Array because I'm not worried about them overflowing if we use them right, and it makes the code a little easier and more readable.
Step 2, the instruction set (planning)
Which instructions do you want? If you're building a redstone computer in Minecraft, chances are you can only do a few arithmetic and logic operations like adding and subtracting. If you want a more modern instruction set, you will likely include more complex instructions such as multiply and divide. The MASIC instruction set has 15 instructions including load from memory address to register 1 and add memory address to register 2. These instructions are usually laid out in a opcode table, which assigns a binary (or hexadecimal) code to each instruction. The MASIC instruction set can fit in 4 bits worth of unique codes, and are laid out like this:
|0000||LOAD R1||Load the ADDRESS into register 1|
|0001||STORE R1||Store contents of register 1 into ADDRESS|
|0010||JUMP R1 IF||Jump to line ADDRESS if register 1 is equal to 0|
|0011||ADD R1||Add contents at ADDRESS to register 1|
|0100||<<R1||Bitshift register 1 left|
|0101||NOT R1||Bitwise NOT register 1|
|0110||JUMP||Jump to line OPERAND|
|0111||STOP||Terminate the program.|
|1000||LOAD R2||Load the ADDRESS into register 2|
|1001||STORE R2||Store contents of register 2 into ADDRESS|
|1010||JUMP R2 IF||Jump to line ADDRESS if register 2 is equal to 0|
|1011||ADD R2||Add ADDRESS to register 2|
|1100||<<R2||Bitshift register 2 left|
|1101||NOT R2||Bitwise NOT register 2|
|1110||OUT R1||Outputs register 1|
You'll notice many instructions have something to do with ADDRESS. Here's how MASIC instructions work: MASIC instructions are 8 bits, stored in 8-bit memory. The instructions are split into 2 parts. The first 4-bits are the "type", which are what the computer should do and are decoded according to the above table. The lower 4-bits are the "address", which is a 4-bit address you might read from, write to, or jump to. 4 bits gives us 16 addressable memory slots, which is why the "RAM" is 16 "words" long. Jumping also sets the program counter to the "address", which is why it's 4 bits.
This sort of instruction splitting is pretty common for instruction sets, but it can be done any number of ways and it doesn't have to be done this way at all. If we had 8 or less commands and wanted more memory to use, we could use 3 bits for "type" and 5 bits for "address", giving us 32 words of addressable memory!
Alternatively, you could have a system where instructions take up the whole memory, and additional data like the "address" could go in the next memory slot. This is one place where you can really get creative! If you want, say 8-bit values but need maybe 10 bits for instructions, you could make a separate memory for instructions with at least 10 bits, and a separate memory for data, like variables and constants.
Step 3, the instruction set (implementing)
Woah. Most of this is pretty straight-forward, I think, but I'll explain a few things. Firstly, I added a new command "Input", code 1111 in binary. It sets the memory address specified in "address" to an integer inputted. Using
prompt for this and using
console.log for the "Output" command is rather arbitrary. You can choose any input and output you want, this is just what looks best in the repl.it nodejs console.
Step 4, RUN!
We have the computer, now it needs to run. Most microprocessors go through a cycle known as "fetch, decode, and execute". These three steps are commonly referred to as one CPU cycle. A processor's speed in gigahertz is how many billion cycles it can do in a second. These steps are what microprocessors do, but are relatively easy to code. Here's how it looked in my implementation for the MASIC instruction set:
INSTRUCTION object set up with functions that take a parameter, we can just use the
TYPE to get the correct function and pass it
ADDRESS as the parameter. The function will execute, and the cycle is complete.
Whew! So what now? We want to automate the cycles, keep doing this until we have to stop! We could make a
doOneCycle function and step through our programs or maybe set an interval and do cycles at a certain rate like a real CPU, but I'm going to make a
RUN function that runs until it has to
We might forget to stop our programs properly, so I added a
cycles parameter to specify how many cycles to do until we should stop executing. Now out "microprocessor" will run until it
STOPs, or until we've done too many cycles and have to stop, perhaps to prevent infinite loops.
And we're done!
Congratulations, you have just created an instruction set simulator! So what now? We program it!
Programming your microprocessor, or simulating machine code
In your memory definitions, you could set specific memory locations to your instructions and then run as normal, but this is rather annoying. I would suggest coming up with some program format that can be translated into values and put in the memory. You could make some sort of assembly language or use hexadecimal values, but I'm just going to go with a string with new instructions on newlines, 4 binary bits for the type, a space, and 4 bits for the address. He is what the example "Fibonacci" program described in the MASIC tutorial looks like. It has instructions, some blank unused memory, and some data values:
I added the comment at the bottom because this program loops infinitely and after 41 cycles it reaches the maximum value that can be stored in 8 bits. After that it will overflow and produce incorrect results.
To run this program, we need to parse the string and load it into memory. We will probably want to make sure the memory and registers are reset first, and run the program after it's loaded into memory. Here's my function:
It optionally takes a number of cycles (the max is set at 10,000) and a value for the program counter, for if you wanted to start the program at a different instruction. It iterates through each line in the the "program" specified and sets the memory at that place to the integer value in the program, parsed as binary (base 2). It then
RUNs the program for the given cycles (or 10,000) and then logs the message "process finished".
Here's another short program that I wrote that uses the new "Input" command:
Running is as easy as:
Wow! I've written a lot and it's getting late, but I hope you find this tutorial useful. The repl I added below (also available here), shows the finished product and runs the "Fibonacci" and "square" programs when you run it. It also contains a MASIC.md file which contains the chart of instruction and an programs.txt file that contains some program challenges. Programming MASIC is a challenge! You'll find 16 bytes of memory pretty restrictive!
In the comments, please add your own ISS if you make one, any port of the MASIC ISS, and any programs you create for the MASIC ISS or your own ISS!