This is the second part of my series about creating a custom virtual machine: part 1 - assembly VM.
The code is here: https://github.com/OxNinja/C-VM
Introduction#
Doing the PoC in assembly (see part 1) gave me enough information about how to create my own virtual machine, I got the basic concepts for such a subject, such as:
- Parsing opcode
- Emulating instruction
- Use of virtual registers
It also showed me the limitations of this language, I needed a more sofisticated yet low level one, C was the perfect match π₯΅.
Architecture#
I came with the following flow for the VM:
PoC||GTFO#
Let’s break down how I created this VM.
Registers#
In order to store information such as inputs or outputs, I needed some registers. These must be readable and writeable from anywhere in my program, I needed to make either a forbidden global variable, or create a local one which will be passed to the used functions. I went with the second solution, as it is less dirty, and I wanted to try my best by using pointers and C stuff.
I created a struct for my registers:
1typedef struct Registers {
2 // common operations registers
3 int a, b, c, d;
4
5 // array to work with when manipulating registers' indexes
6 // each element will point to the address of the correspponding register
7 // see `setup_registers()` for more details
8 int *registers[4];
9
10 // flags are stored in one integer, using masks to extract them
11 // remainder, zero (cmp)
12 int flags;
13} Registers;
I started to work with only 4 registers a, b, c, d
and one for the flags after instruction’s execution.
I also needed a way to (re)set the said registers to whatever I wanted, so I created this reset function:
1/* Set all registers in the array in order to easyliy manupulate them
2 * like: regs->registers[2] = 0x2a;
3 * is the same as: regs->c = 0x2a;
4 */
5void setup_registers(Registers *regs) {
6 // each element of the array points to the corresponding register's address
7 regs->registers[0] = ®s->a;
8 regs->registers[1] = ®s->b;
9 regs->registers[2] = ®s->c;
10 regs->registers[3] = ®s->d;
11}
12
13/* Just print the values
14 */
15void print_registers(Registers *regs) {
16 printf("=== Registers: ===\n");
17 printf("a: 0x%x\n", regs->a);
18 printf("b: 0x%x\n", regs->b);
19 printf("c: 0x%x\n", regs->c);
20 printf("d: 0x%x\n", regs->d);
21 printf("flags: 0x%x\n", regs->flags);
22}
23
24/* Force the values of the registers to 0
25 */
26void reset_registers(Registers *regs) {
27 regs->a = 0;
28 regs->b = 0;
29 regs->c = 0;
30 regs->d = 0;
31 regs->flags = 0;
32}
Emulation#
Emulating an instruction is very basic:
- Parse the input
- Detect the corresponding instruction
- Execute the instruction
But I wanted to do something a bit fancy here: instead of just make a big 0xswitch
statement, I created a map, or more precisely an array of pointers of functions. Meaning that each entry of the array is a pointer, pointing to the corresponding function to call:
1void emulate(Registers *regs, int shellcode) {
2 // parsing the input to extract only the opcode
3 int opcode = (shellcode & 0xff000000) >> 0x18;
4
5 // instructions is an array of pointers of function
6 // each index points to the according function corresponding to the opcode
7 // it is very easy to change the opcode for a certain function
8 void (*instructions[10])(Registers *, int);
9 // no opcode 0 defined for the moment
10 instructions[1] = my_mov;
11 instructions[2] = my_push;
12 instructions[3] = my_add;
13 instructions[4] = my_sub;
14 instructions[5] = my_jmp;
15 instructions[6] = my_cmp;
16 instructions[7] = my_call;
17 instructions[8] = my_exit;
18 instructions[9] = my_pop;
19
20 // code ommited for future spoilers
21 redacted();
22}
Why this “crazy” stuff instead of the good old switch
? You may ask. Well, for the sake of simplicity, yes, s i m p l i c i t y, I used this strategy for a good reason:
1// calling the corresponding function only takes 1 line of code,
2// and no processing at all: no if, nor loop
3(*instructions[opcode])(regs, shellcode);
Then each function, such as my_mov
and so, do the wanted behaviour of the corresponding instruction, for example:
1/* Moves the value into the register
2 * value is either a register or a plain hex integer
3 */
4void my_mov(Registers *regs, int shellcode) {
5 int is_reg1 = (shellcode & 0x00f00000) >> 0x14;
6 if (is_reg1 == 0x1) {
7 // get index of target reg
8 int target_reg = (shellcode & 0x000f0000) >> 0x10;
9 // get value to mov
10 int is_reg2 = (shellcode & 0x0000f000) >> 0xc;
11 // get moved value
12 int value = (shellcode & 0x00000fff);
13 // if source is a register and not a value
14 if (is_reg2 == 0x1) {
15 int source_reg = value >> 0x8;
16 value = *regs->registers[source_reg];
17 }
18
19 // finally, move the value into the register
20 *regs->registers[target_reg] = value;
21
22 } else {
23 except("Invalid value for mov (arg a is not a register)");
24 }
25}
Which leads to my next subject: parsing.
Parsing#
Yes, I did not mentionned how my instructions are encoded and how to parse them. See the following scheme to understand my way of crafting one instruction:
opcode | isReg1 | value1 | isReg2 | value2 |
---|---|---|---|---|
01 | 1 | 0 | 0 | 045 |
This instruction (0x1100045
) is a mov a, 0x45
. Yes this is a bit silly but here is an another scheme in order to better explain my way of encoding my instructions:
And here is the size of each portion of the instruction:
opcode
: wordisReg1
: bytevalue1
: byte ifisReg1 == 1
, any size else (depends on the instruction)isReg2
: bytevalue2
: byte ifisReg2 == 1
, any size else (depends on the instruction)
I made the choice to use a constant-sized instruction set, to help me parsing each one, instead of having to hardcode every variant that a variable-length instruction set would require.
Once this logic has been declared, there was one thing left to do: actually parsing the instructions. In fact, as you may have noticed in my instruction functions (my_mov() my_add()...
), I used binary masking and shifting like so: (a && 0xff) >> 0x10
.
Demo time#
I used the following code:
1int main(void) {
2 // init the struct
3 Registers regs;
4 // setup the registers' array
5 setup_registers(®s);
6
7 // set everything to 0
8 reset_registers(®s);
9 print_registers(®s);
10
11 // mov a, 0x45
12 emulate(®s, 0x1100045);
13 print_registers(®s);
14
15 // mov c, 0x2
16 emulate(®s, 0x1120002);
17 print_registers(®s);
18
19 // exit(a)
20 emulate(®s, 0x8000000);
21
22 // yes, pointless return but it is for the personnal ethics
23 return 0;
24}
Special thanks to: