🔎 Creating a VM for fun - Part 2: C
The code is here: https://github.com/OxNinja/C-VM
Doing the PoC in assembly (see part 1) gave me enough information about how to create my own virtual machine, I got the basic concepts for such a subject, such as:
- Parsing opcode
- Emulating instruction
- Use of virtual registers
It also showed me the limitations of this language, I needed a more sofisticated yet low level one, C was the perfect match 🥵.
I came with the following flow for the VM:
Let’s break down how I created this VM.
In order to store information such as inputs or outputs, I needed some registers. These must be readable and writeable from anywhere in my program, I needed to make either a
forbidden global variable, or create a local one which will be passed to the used functions. I went with the second solution, as it is less dirty, and I wanted to try my best by using pointers and C stuff.
I created a struct for my registers:
I started to work with only 4 registers
a, b, c, d and one for the flags after instruction’s execution.
I also needed a way to (re)set the said registers to whatever I wanted, so I created this reset function:
Emulating an instruction is very basic:
- Parse the input
- Detect the corresponding instruction
- Execute the instruction
But I wanted to do something a bit fancy here: instead of just make a big
switch statement, I created a map, or more precisely an array of pointers of functions. Meaning that each entry of the array is a pointer, pointing to the corresponding function to call:
Why this “crazy” stuff instead of the good old
switch? You may ask. Well, for the sake of simplicity, yes, s i m p l i c i t y, I used this strategy for a good reason:
Then each function, such as
my_mov and so, do the wanted behaviour of the corresponding instruction, for example:
Which leads to my next subject: parsing.
Yes, I did not mentionned how my instructions are encoded and how to parse them. See the following scheme to understand my way of crafting one instruction:
This instruction (
0x1100045) is a
mov a, 0x45. Yes this is a bit silly but here is an another scheme in order to better explain my way of encoding my instructions:
And here is the size of each portion of the instruction:
value1: byte if
isReg1 == 1, any size else (depends on the instruction)
value2: byte if
isReg2 == 1, any size else (depends on the instruction)
I made the choice to use a constant-sized instruction set, to help me parsing each one, instead of having to hardcode every variant that a variable-length instruction set would require.
Once this logic has been declared, there was one thing left to do: actually parsing the instructions. In fact, as you may have noticed in my instruction functions (
my_mov() my_add()...), I used binary masking and shifting like so:
(a && 0xff) >> 0x10.
This is propably the most difficult thing in this project for me, as I had to figure out how to implement a virtual stack and related stuff.
I first thought about using a pointer to a
malloced chunck as the stack, where I could store pointers to the values, so here is the struct:
A few explanations about this propably cursed struct:
max_sizeis the size of the stack (max number of pointer that could be stored in it).
*stackis a pointer to the allocated chunk in memory to store the pointers.
*stack_baseis a pointer to the base of the stack (the first place to store pointers at).
*stack_endis a pointer to the end of the stack (the limit of its size).
**stack_pointeris a pointer of the current “cursor” in the stack, pointing to the stored pointer in it.
Feel free to visit the project’s repo to check if I finished this implementation, but by now I am for sure struggling with this.
In fact with this virtual stack the VM is now able to
pop and all that stuff, here is how I implemented them:
The same goes for the
pop, except that we first decrement the stack pointer index, as we incremented it last, and then we store the value pointed into the corresponding register.
I used the following code: