๐ Creating a VM for fun - Part 2: C

The code is here: https://github.com/OxNinja/C-VM
Introduction
Doing the PoC in assembly (see part 1) gave me enough information about how to create my own virtual machine, I got the basic concepts for such a subject, such as:
- Parsing opcode
- Emulating instruction
- Use of virtual registers
It also showed me the limitations of this language, I needed a more sofisticated yet low level one, C was the perfect match ๐ฅต.
Architecture
I came with the following flow for the VM:
PoC||GTFO
Let’s break down how I created this VM.
Registers
In order to store information such as inputs or outputs, I needed some registers. These must be readable and writeable from anywhere in my program, I needed to make either a forbidden global variable, or create a local one which will be passed to the used functions. I went with the second solution, as it is less dirty, and I wanted to try my best by using pointers and C stuff.
I created a struct for my registers:
|
|
I started to work with only 4 registers a, b, c, d
and one for the flags after instruction’s execution.
I also needed a way to (re)set the said registers to whatever I wanted, so I created this reset function:
|
|
Emulation
Emulating an instruction is very basic:
- Parse the input
- Detect the corresponding instruction
- Execute the instruction
But I wanted to do something a bit fancy here: instead of just make a big 0xswitch
statement, I created a map, or more precisely an array of pointers of functions. Meaning that each entry of the array is a pointer, pointing to the corresponding function to call:
|
|
Why this “crazy” stuff instead of the good old switch
? You may ask. Well, for the sake of simplicity, yes, s i m p l i c i t y, I used this strategy for a good reason:
|
|
Then each function, such as my_mov
and so, do the wanted behaviour of the corresponding instruction, for example:
|
|
Which leads to my next subject: parsing.
Parsing
Yes, I did not mentionned how my instructions are encoded and how to parse them. See the following scheme to understand my way of crafting one instruction:
opcode | isReg1 | value1 | isReg2 | value2 |
---|---|---|---|---|
01 | 1 | 0 | 0 | 045 |
This instruction (0x1100045
) is a mov a, 0x45
. Yes this is a bit silly but here is an another scheme in order to better explain my way of encoding my instructions:
And here is the size of each portion of the instruction:
opcode
: wordisReg1
: bytevalue1
: byte ifisReg1 == 1
, any size else (depends on the instruction)isReg2
: bytevalue2
: byte ifisReg2 == 1
, any size else (depends on the instruction)
I made the choice to use a constant-sized instruction set, to help me parsing each one, instead of having to hardcode every variant that a variable-length instruction set would require.
Once this logic has been declared, there was one thing left to do: actually parsing the instructions. In fact, as you may have noticed in my instruction functions (my_mov() my_add()...
), I used binary masking and shifting like so: (a && 0xff) >> 0x10
.
Stack implementation
This is propably the most difficult thing in this project for me, as I had to figure out how to implement a virtual stack and related stuff.
I first thought about using a pointer to a malloc
ed chunck as the stack, where I could store pointers to the values, so here is the struct:
|
|
A few explanations about this propably cursed struct:
max_size
is the size of the stack (max number of pointer that could be stored in it).*stack
is a pointer to the allocated chunk in memory to store the pointers.*stack_base
is a pointer to the base of the stack (the first place to store pointers at).*stack_end
is a pointer to the end of the stack (the limit of its size).**stack_pointer
is a pointer of the current “cursor” in the stack, pointing to the stored pointer in it.
Feel free to visit the project’s repo to check if I finished this implementation, but by now I am for sure struggling with this.
In fact with this virtual stack the VM is now able to push
& pop
and all that stuff, here is how I implemented them:
|
|
The same goes for the pop
, except that we first decrement the stack pointer index, as we incremented it last, and then we store the value pointed into the corresponding register.
Demo time
I used the following code:
|
|