To make things short, I saw How to write a virtual machine in order to hide your viruses and break your brain forever by @s01den published in tmp.out’s second edition. This new paper made me enjoy (once again) low-level. I wanted to know more about this abstract subject of “virtual machines” in reverse engineering, so I read it and started to implement my own VM in assembly!
You will find my code on my Github repo
Why assembly?#
I wanted to understand everything I did during this process, and needed to stick with the lowest level I could, I will talk about the future of this project at the end of the post.
I was also already familiar with assembly, especially nasm for Linux, and wanted to test my knowledge.
Design#
Before staring to type very fast on my keyboard, I needed to put things on a paper, in order to have a clear overview of the project.
I had to answer a few questions:
What is an instruction?
It’s like a function, or an alias to some code to execute
How does the CPU knows what to do with an instruction?
The code an instruction represents is written for the CPU, so it is seamless
How can I make custom instructions?
Just implemet some functions or code blocks, then map them to a custom “OPcode”
How can I make the CPU execute my custom instructions?
Make a simple condition on the custom OPcode, and execute the code mapped to it
PoC#
Registers#
To (re)set registers, code is very straightforward and don’t really need explainations, right?
1reset_registers:
2 push rbp
3 mov rbp, rsp
4
5 xor rax, rax
6 xor rbx, rbx
7 xor rcx, rcx
8 xor rdx, rdx
9 xor r8, r8
10
11 leave
12 ret
Instructions#
I decided to implement a very low amount of instructions, as I already plan to upgrade this project in the future. I only need a proof of concept before going big.
OPcode | Instruction | NASM |
---|---|---|
0x1 | mov a, b | mov rbx, rcx |
0x2 | push a | push rbx |
0x3 | add a, b | add rbx, rcx |
0x4 | jmp a | jmp rbx |
Yes, some very basic instructions.
Execution#
The concept here is to compare rax
, our opcode register and then call the corresponding function:
1;; if opcode == 0x1:
2;; mov_a_b()
3cmp rax, 0x1
4je mov_a_b
5
6cmp rax, 0x2
7je push_a
8
9cmp rax, 0x3
10je add_a_b
11
12;; and so on with every opcode
13
14call _exit ;; default if unrecognized opcode
Future#
In a future post I will cover how to improve this VM, especially using a fully emulated virtual memory, using C. 😎