# 🔎 Creating a VM for fun - Part 2: C

Contents
Note
This is the second part of my series about creating a custom virtual machine: part 1 - assembly VM.

The code is here: https://github.com/OxNinja/C-VM

## Introduction

Doing the PoC in assembly (see part 1) gave me enough information about how to create my own virtual machine, I got the basic concepts for such a subject, such as:

• Parsing opcode
• Emulating instruction
• Use of virtual registers

It also showed me the limitations of this language, I needed a more sofisticated yet low level one, C was the perfect match 🥵.

## Architecture

I came with the following flow for the VM:

## PoC||GTFO

Let’s break down how I created this VM.

### Registers

In order to store information such as inputs or outputs, I needed some registers. These must be readable and writeable from anywhere in my program, I needed to make either a forbidden global variable, or create a local one which will be passed to the used functions. I went with the second solution, as it is less dirty, and I wanted to try my best by using pointers and C stuff.

I created a struct for my registers:

  1 2 3 4 5 6 7 8 9 10 11 12 13  typedef struct Registers { // common operations registers int a, b, c, d; // array to work with when manipulating registers' indexes // each element will point to the address of the correspponding register // see setup_registers() for more details int *registers[4]; // flags are stored in one integer, using masks to extract them // remainder, zero (cmp) int flags; } Registers; 

I started to work with only 4 registers a, b, c, d and one for the flags after instruction’s execution.

I also needed a way to (re)set the said registers to whatever I wanted, so I created this reset function:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32  /* Set all registers in the array in order to easyliy manupulate them * like: regs->registers[2] = 0x2a; * is the same as: regs->c = 0x2a; */ void setup_registers(Registers *regs) { // each element of the array points to the corresponding register's address regs->registers[0] = ®s->a; regs->registers[1] = ®s->b; regs->registers[2] = ®s->c; regs->registers[3] = ®s->d; } /* Just print the values */ void print_registers(Registers *regs) { printf("=== Registers: ===\n"); printf("a: 0x%x\n", regs->a); printf("b: 0x%x\n", regs->b); printf("c: 0x%x\n", regs->c); printf("d: 0x%x\n", regs->d); printf("flags: 0x%x\n", regs->flags); } /* Force the values of the registers to 0 */ void reset_registers(Registers *regs) { regs->a = 0; regs->b = 0; regs->c = 0; regs->d = 0; regs->flags = 0; } 

### Emulation

Emulating an instruction is very basic:

1. Parse the input
2. Detect the corresponding instruction
3. Execute the instruction

But I wanted to do something a bit fancy here: instead of just make a big 0xswitch statement, I created a map, or more precisely an array of pointers of functions. Meaning that each entry of the array is a pointer, pointing to the corresponding function to call:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  void emulate(Registers *regs, int shellcode) { // parsing the input to extract only the opcode int opcode = (shellcode & 0xff000000) >> 0x18; // instructions is an array of pointers of function // each index points to the according function corresponding to the opcode // it is very easy to change the opcode for a certain function void (*instructions[10])(Registers *, int); // no opcode 0 defined for the moment instructions[1] = my_mov; instructions[2] = my_push; instructions[3] = my_add; instructions[4] = my_sub; instructions[5] = my_jmp; instructions[6] = my_cmp; instructions[7] = my_call; instructions[8] = my_exit; instructions[9] = my_pop; // code ommited for future spoilers redacted(); } 

Why this “crazy” stuff instead of the good old switch? You may ask. Well, for the sake of simplicity, yes, s i m p l i c i t y, I used this strategy for a good reason:

 1 2 3  // calling the corresponding function only takes 1 line of code, // and no processing at all: no if, nor loop (*instructions[opcode])(regs, shellcode); 

Then each function, such as my_mov and so, do the wanted behaviour of the corresponding instruction, for example:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  /* Moves the value into the register * value is either a register or a plain hex integer */ void my_mov(Registers *regs, int shellcode) { int is_reg1 = (shellcode & 0x00f00000) >> 0x14; if (is_reg1 == 0x1) { // get index of target reg int target_reg = (shellcode & 0x000f0000) >> 0x10; // get value to mov int is_reg2 = (shellcode & 0x0000f000) >> 0xc; // get moved value int value = (shellcode & 0x00000fff); // if source is a register and not a value if (is_reg2 == 0x1) { int source_reg = value >> 0x8; value = *regs->registers[source_reg]; } // finally, move the value into the register *regs->registers[target_reg] = value; } else { except("Invalid value for mov (arg a is not a register)"); } } 

Which leads to my next subject: parsing.

### Parsing

Yes, I did not mentionned how my instructions are encoded and how to parse them. See the following scheme to understand my way of crafting one instruction:

opcodeisReg1value1isReg2value2
01100045

This instruction (0x1100045) is a mov a, 0x45. Yes this is a bit silly but here is an another scheme in order to better explain my way of encoding my instructions:

And here is the size of each portion of the instruction:

• opcode: word
• isReg1: byte
• value1: byte if isReg1 == 1, any size else (depends on the instruction)
• isReg2: byte
• value2: byte if isReg2 == 1, any size else (depends on the instruction)

I made the choice to use a constant-sized instruction set, to help me parsing each one, instead of having to hardcode every variant that a variable-length instruction set would require.

Once this logic has been declared, there was one thing left to do: actually parsing the instructions. In fact, as you may have noticed in my instruction functions (my_mov() my_add()...), I used binary masking and shifting like so: (a && 0xff) >> 0x10.

## Demo time

I used the following code:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24  int main(void) { // init the struct Registers regs; // setup the registers' array setup_registers(®s); // set everything to 0 reset_registers(®s); print_registers(®s); // mov a, 0x45 emulate(®s, 0x1100045); print_registers(®s); // mov c, 0x2 emulate(®s, 0x1120002); print_registers(®s); // exit(a) emulate(®s, 0x8000000); // yes, pointless return but it is for the personnal ethics return 0; } 

Thanks

Special thanks to:

• Masterfox: for helping me debugging my issues with pointers and structs
• Nofix: for helping me debugging my array of pointers of functions stuff