Skip to main content

πŸ”Ž Creating a VM for fun - Part 5: Python

·874 words·5 mins· loading · loading · · Draft
Custom VM python low-level reverse
0xNinja
Author
0xNinja
mov al, 11
Table of Contents
Custom VM - This article is part of a series.
Part 5: This Article

Introduction
#

OxNinja/Soft-Machine

Python custom virtual machine

Python
0
0

As I wanted to continue my VM journey, I needed some changes and started once again from scratch, in Python this time.

This time I wanted to make a package with some purposes, with the following features:

  • Exec compiled code
  • Assemble code into compiled code
  • Disassemble compiled code into assembly

VM workflow
#

The idea as described above, as for me, required a CLI in order to be a one-for-all tool. I went for a Python package based on click.

Here is an overview of what I thought of:

flowchart TB subgraph Parser 1[Assemble] 2[Disassemble] end subgraph VM direction RL A[Instructions] B[Stack] C[Heap] end subgraph CLI direction RL a[Exec] b[Assemble] c[Disassemble] end

Instructions
#

As usual, I implemented every instructions and used a global list for each one of them:

 1INSTRUCTIONS = (
 2        mov,
 3        push,
 4        pop,
 5        add,
 6        sub,
 7        cmp,
 8        call,
 9        jmp,
10        _exit
11        )

Here is the code for the mov instruction:

 1def mov(vm, opcode):
 2    # first arg is always a register
 3    target = opcode >> (vm.opcode_size - 2) * 4 & 0xf
 4    if target < 0 or target >= len(vm.regs.regs):
 5        print(f"Unexpected value for target: {target}. Must be between 0 (included) and {len(vm.regs.regs)} (excluded).")
 6        exit(1)
 7
 8    # second arg can be either a register, or a value
 9    is_reg = opcode >> (vm.opcode_size - 3) * 4 & 0xf
10
11    if is_reg == 0:
12        value = opcode & (16 ** (vm.opcode_size - 3) - 1)
13    elif is_reg == 1:
14        index = opcode >> (vm.opcode_size - 4) * 4 & 0xf
15        if index < 0 or index >= len(vm.regs.regs):
16            print(f"Unexpected value for index: {index}. Must be between 0 (included) and {len(vm.regs.regs)} (excluded).")
17            exit(1)
18        
19        value = vm.regs.get(index)
20
21    elif is_reg != 0 or is_reg != 1:
22        print(f"Unexpected value for is_reg: {is_reg}. Must be either 0 or 1.")
23        exit(1)
24
25    vm.regs.set(target, value)

VM
#

As you may have noticed, my code is more object-oriented. In addition, I made a class for every components of the project.

Here is the constructor for my VM:

1class VM:
2    def __init__(self):
3        self.stack = Stack()
4        self.regs = Registers()
5        self.opcode_size = 8

Yes, as Python is more permissive than C, I asked myself “why not making the opcodes configurable?”, leading to this self.opcode_size attribute.

In the future I would like to be able to fully manage the VM and making it really customizable, the same goes for the opcodes, the instructions, the stack, the heap…

There is not much more to look for this VM class, only the exec function, which redirects to the corresponding instruction.

Exec test
#

Let’s first manually compile the following assembly code into our VM’s opcodes:

1add a, 0x1000   ; 0x30001000
2mov b, a        ; 0x01100000
3sub b, 0x100    ; 0x41000100

We then exec this code:

1softmachine exec -s "0x30001000;0x01100000;0x41000100"                                 
2===== VM =====
3Registers:
4a: 4096
5b: 3840
6c: 0
7d: 0
8flags: 0

Stack
#

As I represented the stack as a list, it is trivial to push/pop values:

 1class Stack:
 2    def __init__(self, max_size=0xffffff):
 3        self.stack = list()
 4        self.max_size = max_size
 5        self.size = len(self.stack)
 6
 7    def push(self, value):
 8        if self.size < self.max_size:
 9            self.stack.append(value)
10            self.update()
11
12    def pop(self):
13        if self.size > 0:
14            self.stack.pop()
15            self.update()
16
17    def update(self):
18        self.size = len(self.stack)

And here is the corresponding code for the pop instruction:

 1def pop(vm, opcode):
 2    if vm.stack.size < 1:
 3        print("Cannot pop, the stack is empty.")
 4        exit(1)
 5
 6    # first arg is always a register
 7    target = opcode >> (vm.opcode_size - 2) * 4 & 0xf
 8    if target < 0 or target >= len(vm.regs.regs):
 9        print(f"Unexpected value for target: {target}. Must be between 0 (included) and {len(vm.regs.regs)} (excluded).")
10        exit(1)
11
12    value = vm.stack.stack[-1]
13    vm.regs.set(target, value)
14    vm.stack.pop()

So yeah, this is way easier to implement a VM in Python, as it abstracts a lot of things. The drawback is the loss of memory management.

Performances
#

Let’s talk a bit about performances.

To let myself track the differences regarding execution time, I made a stress test for the VM. Very simple in the first place because there are not that much features.

1class VM:
2    # ...
3
4    def stress_test(self, n):
5        for x in range(n):
6            INSTRUCTIONS[1](self, 0x10ffffff)
7        for x in range(n):
8            INSTRUCTIONS[2](self, 0x20000000)

This does n times push 0xffffff, then n times pop a. I wanted not to put push and pop in the same loop cycle, in order to monitor the memory usage in case of many allocations.

Here are the results on my machine:

1time softmachine stresstest -n 10000   
2# softmachine stresstest -n 10000  0.07s user 0.01s system 98% cpu 0.077 total
3
4time softmachine stresstest -n 1000000
5# softmachine stresstest -n 1000000  1.13s user 0.03s system 99% cpu 1.164 total
6
7time softmachine stresstest -n 10000000
8# softmachine stresstest -n 10000000  10.79s user 0.18s system 99% cpu 10.983 total

The execution time seems – for the moment – to be linear, depending on the number of instructions to execute.

Custom VM - This article is part of a series.
Part 5: This Article

Related

Draft
πŸ”Ž Creating a VM for fun - Part 2: C
·1225 words·6 mins· loading · loading
Custom VM c low-level reverse
Draft
πŸ”Ž Creating a VM for fun - Part 1: ASM
·451 words·3 mins· loading · loading
Custom VM assembly low-level reverse
Draft
πŸ”Ž Creating a VM for fun - Part 4: Implementation and refacto
·461 words·3 mins· loading · loading
Custom VM c low-level reverse
Draft
πŸ”Ž Creating a VM for fun - Part 3: C virtual stack
·919 words·5 mins· loading · loading
Custom VM c low-level reverse
🏌️ BGGP3 - How to crash a famous JS engine for fun
·1571 words·8 mins· loading · loading
Binary golfing bggp low-level reverse golf js fuzz
Les pyjails pour les dΓ©butants
·513 words·3 mins· loading · loading
ctf jail python