Introduction#
Python custom virtual machine
As I wanted to continue my VM journey, I needed some changes and started once again from scratch, in Python this time.
This time I wanted to make a package with some purposes, with the following features:
- Exec compiled code
- Assemble code into compiled code
- Disassemble compiled code into assembly
VM workflow#
The idea as described above, as for me, required a CLI in order to be a one-for-all tool. I went for a Python package based on click
.
Here is an overview of what I thought of:
Instructions#
As usual, I implemented every instructions and used a global list for each one of them:
1INSTRUCTIONS = (
2 mov,
3 push,
4 pop,
5 add,
6 sub,
7 cmp,
8 call,
9 jmp,
10 _exit
11 )
Here is the code for the mov
instruction:
1def mov(vm, opcode):
2 # first arg is always a register
3 target = opcode >> (vm.opcode_size - 2) * 4 & 0xf
4 if target < 0 or target >= len(vm.regs.regs):
5 print(f"Unexpected value for target: {target}. Must be between 0 (included) and {len(vm.regs.regs)} (excluded).")
6 exit(1)
7
8 # second arg can be either a register, or a value
9 is_reg = opcode >> (vm.opcode_size - 3) * 4 & 0xf
10
11 if is_reg == 0:
12 value = opcode & (16 ** (vm.opcode_size - 3) - 1)
13 elif is_reg == 1:
14 index = opcode >> (vm.opcode_size - 4) * 4 & 0xf
15 if index < 0 or index >= len(vm.regs.regs):
16 print(f"Unexpected value for index: {index}. Must be between 0 (included) and {len(vm.regs.regs)} (excluded).")
17 exit(1)
18
19 value = vm.regs.get(index)
20
21 elif is_reg != 0 or is_reg != 1:
22 print(f"Unexpected value for is_reg: {is_reg}. Must be either 0 or 1.")
23 exit(1)
24
25 vm.regs.set(target, value)
VM#
As you may have noticed, my code is more object-oriented. In addition, I made a class for every components of the project.
Here is the constructor for my VM:
1class VM:
2 def __init__(self):
3 self.stack = Stack()
4 self.regs = Registers()
5 self.opcode_size = 8
Yes, as Python is more permissive than C, I asked myself “why not making the opcodes configurable?”, leading to this self.opcode_size
attribute.
In the future I would like to be able to fully manage the VM and making it really customizable, the same goes for the opcodes, the instructions, the stack, the heap…
There is not much more to look for this VM class, only the exec function, which redirects to the corresponding instruction.
Exec test#
Let’s first manually compile the following assembly code into our VM’s opcodes:
1add a, 0x1000 ; 0x30001000
2mov b, a ; 0x01100000
3sub b, 0x100 ; 0x41000100
We then exec this code:
1softmachine exec -s "0x30001000;0x01100000;0x41000100"
2===== VM =====
3Registers:
4a: 4096
5b: 3840
6c: 0
7d: 0
8flags: 0
Stack#
As I represented the stack as a list, it is trivial to push/pop values:
1class Stack:
2 def __init__(self, max_size=0xffffff):
3 self.stack = list()
4 self.max_size = max_size
5 self.size = len(self.stack)
6
7 def push(self, value):
8 if self.size < self.max_size:
9 self.stack.append(value)
10 self.update()
11
12 def pop(self):
13 if self.size > 0:
14 self.stack.pop()
15 self.update()
16
17 def update(self):
18 self.size = len(self.stack)
And here is the corresponding code for the pop
instruction:
1def pop(vm, opcode):
2 if vm.stack.size < 1:
3 print("Cannot pop, the stack is empty.")
4 exit(1)
5
6 # first arg is always a register
7 target = opcode >> (vm.opcode_size - 2) * 4 & 0xf
8 if target < 0 or target >= len(vm.regs.regs):
9 print(f"Unexpected value for target: {target}. Must be between 0 (included) and {len(vm.regs.regs)} (excluded).")
10 exit(1)
11
12 value = vm.stack.stack[-1]
13 vm.regs.set(target, value)
14 vm.stack.pop()
So yeah, this is way easier to implement a VM in Python, as it abstracts a lot of things. The drawback is the loss of memory management.
Performances#
Let’s talk a bit about performances.
To let myself track the differences regarding execution time, I made a stress test for the VM. Very simple in the first place because there are not that much features.
1class VM:
2 # ...
3
4 def stress_test(self, n):
5 for x in range(n):
6 INSTRUCTIONS[1](self, 0x10ffffff)
7 for x in range(n):
8 INSTRUCTIONS[2](self, 0x20000000)
This does n
times push 0xffffff
, then n
times pop a
. I wanted not to put push
and pop
in the same loop cycle, in order to monitor the memory usage in case of many allocations.
Here are the results on my machine:
1time softmachine stresstest -n 10000
2# softmachine stresstest -n 10000 0.07s user 0.01s system 98% cpu 0.077 total
3
4time softmachine stresstest -n 1000000
5# softmachine stresstest -n 1000000 1.13s user 0.03s system 99% cpu 1.164 total
6
7time softmachine stresstest -n 10000000
8# softmachine stresstest -n 10000000 10.79s user 0.18s system 99% cpu 10.983 total
The execution time seems – for the moment – to be linear, depending on the number of instructions to execute.