Learn to Code via Tutorials on Repl.it!

← Back to all posts
Instruction set simulators, and how to make one!
RyanYanko

What is an instruction set simulator?

An instruction set simulator (ISS) is a simulation model of a low-level processor. Microprocessors execute machine code instructions, so an ISS could also be thought of as an interpreter for machine code programming. (wikipedia)

That sounds hard!

Writing an interpreter for machine code can sound daunting, but it's actually easier than writing an interpreter for language like Python and a lot easier than you probably think! As it turns out, machine code instructions are executed one by one, and the instructions are often very simple. My simple implementation was only really about 32 lines of Javascript.

So, what is an instruction set?

Briefly, it's a list of instructions that a microprocessor can follow.
To be more in depth, we have to understand what the microprocessor does. The microprocessor or CPU of your computer gets data from memory (or input), "processes it", and then writes it back to memory (or output). A CPU or microprocessor also usually has limited internal memory too, called registers, which hold single values for the processor to work on. The instruction set defines these "processing" instructions such as move a value from RAM to a register, or jump to this instruction, and maybe also add this value from RAM to this register. Many instructions have operands (which are the addresses in RAM or registers that the processor operates on), but there's also instructions like NOP, or "no operation", which just stops execution. (wikipedia)

Why make an instruction set simulator?

If you're not already convinced, consider the following.
1. In the real world, an ISS is often used for debugging, because it allows you to better see what's going on and step through exactly what the computer is doing.
2. If you are designing a computer (any redstone engineers?) or more specifically, a processor, it allows you to try creating programs and test different configurations that help you achieve your goals.
3. ISS's are also great for anyone who wants to learn how computers work on the processor level. They really help you get a feel for what's really going on in your computer!
4. They allow you to create your own (very low-level) programming language! Especially useful for making low-level style esoteric languages like BrainF---.

And lastly, it's fun! So without further ado, let's get into it!

Step 1, building the computer

A microprocessor needs memory to work with, so let's make some! This is also a good time to consider the word size of your virtual computer, which is how big the memory values are (in bits). It's often easiest to just use whatever word size your programming language use, which just means using "regular" integers. If you want a specific size, for example a 16-bit computer, which uses number up to 2^16 (though you can simulate bigger numbers), then you will have to find how to get the right integer type in your language or simulate a smaller word size memory. To simulate N-bits memory, you need to modulo (%) 2^N after every operation.
Next consider how instructions are stored. Should they be in the same RAM where you store other values? What is your instruction's bit length?
The computer I'm building (or really just simulating) is the Minecraft redstone computer described in this tutorial. It's an 8-bit computer and it stores it's instructions and data in the same memory, so we'll declare one 8-bit array called MEMORY:

In C / C++ you would probably want (I think):

If you want a fixed-length integer (as I think it's called) in Python, you might need to do some special stuff.

I'll explain later why I made the array's length 16. Also, I'm using unsigned integers (Uint), but it doesn't matter too much for an ISS.

The aforementioned redstone computer, called MASIC also has 2 general-purpose registers, which will both be 8-bits too. To keep them at 8-bits and also to help group them together, we'll define the registers as another Uint8Array with a length of 2.

Register 1 is REGISTER[0] and register 2 is REGISTER[1]. We're cooking now! Now we just need a few important special-purpose registers.
1. A program counter, or PC, which will hold the address of the instruction we're on. When we execute an instruction, we increment the PC and we'll fetch the next instruction after.
2. A current instruction register, or CIR. The CIR stores the instruction we load from program memory. This is less important in an ISS than in a real microprocessor, but I'm still using it to make things easier.
We'll initialize both to 0 (in binary because it's cool).

They have the same value, but notice how I wrote 4 bits for the PC and the CIR has 8. There are only 16 addresses for instructions (and data) so the PC only needs 4 bits. This will also be explained more later. The instructions we will use occupy the full 8 bits, so the CIR should have 8 bits. I didn't put these in a Uint8Array because I'm not worried about them overflowing if we use them right, and it makes the code a little easier and more readable.
Onwards!

Step 2, the instruction set (planning)

Which instructions do you want? If you're building a redstone computer in Minecraft, chances are you can only do a few arithmetic and logic operations like adding and subtracting. If you want a more modern instruction set, you will likely include more complex instructions such as multiply and divide. The MASIC instruction set has 15 instructions including load from memory address to register 1 and add memory address to register 2. These instructions are usually laid out in a opcode table, which assigns a binary (or hexadecimal) code to each instruction. The MASIC instruction set can fit in 4 bits worth of unique codes, and are laid out like this:

BINARYOPCODECOMMENT
0000LOAD R1Load the ADDRESS into register 1
0001STORE R1Store contents of register 1 into ADDRESS
0010JUMP R1 IFJump to line ADDRESS if register 1 is equal to 0
0011ADD R1Add contents at ADDRESS to register 1
0100<<R1Bitshift register 1 left
0101NOT R1Bitwise NOT register 1
0110JUMPJump to line OPERAND
0111STOPTerminate the program.
1000LOAD R2Load the ADDRESS into register 2
1001STORE R2Store contents of register 2 into ADDRESS
1010JUMP R2 IFJump to line ADDRESS if register 2 is equal to 0
1011ADD R2Add ADDRESS to register 2
1100<<R2Bitshift register 2 left
1101NOT R2Bitwise NOT register 2
1110OUT R1Outputs register 1
1111

You'll notice many instructions have something to do with ADDRESS. Here's how MASIC instructions work: MASIC instructions are 8 bits, stored in 8-bit memory. The instructions are split into 2 parts. The first 4-bits are the "type", which are what the computer should do and are decoded according to the above table. The lower 4-bits are the "address", which is a 4-bit address you might read from, write to, or jump to. 4 bits gives us 16 addressable memory slots, which is why the "RAM" is 16 "words" long. Jumping also sets the program counter to the "address", which is why it's 4 bits.
This sort of instruction splitting is pretty common for instruction sets, but it can be done any number of ways and it doesn't have to be done this way at all. If we had 8 or less commands and wanted more memory to use, we could use 3 bits for "type" and 5 bits for "address", giving us 32 words of addressable memory!
Alternatively, you could have a system where instructions take up the whole memory, and additional data like the "address" could go in the next memory slot. This is one place where you can really get creative! If you want, say 8-bit values but need maybe 10 bits for instructions, you could make a separate memory for instructions with at least 10 bits, and a separate memory for data, like variables and constants.

Step 3, the instruction set (implementing)

This is the bulk of coding, and also the most challenging part to code (I think). In Javascript I use in object, but in any language you will want to use a system where you can associate a (preferably number) key with a (preferably function) value. You could also use an array, which will associate the index with the value. If you can't use functions, try function pointers or even strings containing the function's name.
I use binary numbers when writing out the keys, even though Javascript sees them the same as any numbers, because it makes it easier to me. For the MASIC instruction set, instructions may use an "address" so I use a parameter in the functions. Here's my full implementation:

Woah. Most of this is pretty straight-forward, I think, but I'll explain a few things. Firstly, I added a new command "Input", code 1111 in binary. It sets the memory address specified in "address" to an integer inputted. Using prompt for this and using console.log for the "Output" command is rather arbitrary. You can choose any input and output you want, this is just what looks best in the repl.it nodejs console.
Next, the "Jump if" commands use a little javascript trickey to see if the number is 0. This can also be done differently, and in a real computer would be quite a bit harder to implement this conditional jump. Also, when "negating" ( ~ ), and adding to 8 bit numbers, they will often "overflow" and "underflow". Javascript will keep them within the 8 bit range, but this is something to be aware of. Lastly the "Stop" command, (same as NOP), uses another global variable, which we'll use in the next part.

Step 4, RUN!

We have the computer, now it needs to run. Most microprocessors go through a cycle known as "fetch, decode, and execute". These three steps are commonly referred to as one CPU cycle. A processor's speed in gigahertz is how many billion cycles it can do in a second. These steps are what microprocessors do, but are relatively easy to code. Here's how it looked in my implementation for the MASIC instruction set:

In the fetch step we set the current instruction register to the memory at the location stored in the program counter. We then increment the PC, anticipating fetching the next instruction next. We then "decode" the instruction, but for this step I'm really just splitting up the instruction to make it a little easier. We shift the instruction in the CIR to get the upper 4 bits for type, and do some bitwise operating to get the lower 4 bit address. In javascript, how we have the INSTRUCTION object set up with functions that take a parameter, we can just use the TYPE to get the correct function and pass it ADDRESS as the parameter. The function will execute, and the cycle is complete.
Whew! So what now? We want to automate the cycles, keep doing this until we have to stop! We could make a doOneCycle function and step through our programs or maybe set an interval and do cycles at a certain rate like a real CPU, but I'm going to make a RUN function that runs until it has to STOP.

We might forget to stop our programs properly, so I added a cycles parameter to specify how many cycles to do until we should stop executing. Now out "microprocessor" will run until it STOPs, or until we've done too many cycles and have to stop, perhaps to prevent infinite loops.

And we're done!

Congratulations, you have just created an instruction set simulator! So what now? We program it!

Programming your microprocessor, or simulating machine code

In your memory definitions, you could set specific memory locations to your instructions and then run as normal, but this is rather annoying. I would suggest coming up with some program format that can be translated into values and put in the memory. You could make some sort of assembly language or use hexadecimal values, but I'm just going to go with a string with new instructions on newlines, 4 binary bits for the type, a space, and 4 bits for the address. He is what the example "Fibonacci" program described in the MASIC tutorial looks like. It has instructions, some blank unused memory, and some data values:

I added the comment at the bottom because this program loops infinitely and after 41 cycles it reaches the maximum value that can be stored in 8 bits. After that it will overflow and produce incorrect results.

To run this program, we need to parse the string and load it into memory. We will probably want to make sure the memory and registers are reset first, and run the program after it's loaded into memory. Here's my function:

It optionally takes a number of cycles (the max is set at 10,000) and a value for the program counter, for if you wanted to start the program at a different instruction. It iterates through each line in the the "program" specified and sets the memory at that place to the integer value in the program, parsed as binary (base 2). It then RUNs the program for the given cycles (or 10,000) and then logs the message "process finished".

Here's another short program that I wrote that uses the new "Input" command:

Running is as easy as:

Wrap up

Wow! I've written a lot and it's getting late, but I hope you find this tutorial useful. The repl I added below (also available here), shows the finished product and runs the "Fibonacci" and "square" programs when you run it. It also contains a MASIC.md file which contains the chart of instruction and an programs.txt file that contains some program challenges. Programming MASIC is a challenge! You'll find 16 bytes of memory pretty restrictive!

In the comments, please add your own ISS if you make one, any port of the MASIC ISS, and any programs you create for the MASIC ISS or your own ISS!

Voters
lynwod
ANDREWVOSS
BotsBoots
Highwayman
RyanYanko
programmeruser
Comments
hotnewtop
ANDREWVOSS

I know it's been 9 months, but I recently remembered this post and made a C++ port of your ISS. Here's the repl link

RyanYanko

@ANDREWVOSS bing bong

ANDREWVOSS