In May I took Dave Beazley’s week-long course on compilers. This is the first post in a series about that course. Here’s where you can see all the posts so far.
How does the code we write end up actually doing anything?

I’ve heard and read a few descriptions, and they mostly go like this: “First, your compiler converts the code into instructions that your machine can understand. Then, if you’ve used an interpreter, it runs those instructions on your machine—otherwise, you manually run the instructions.”
That description glosses over a lot of details. To dig into them, I cracked Bob Nystrom’s Crafting Interpreters (ongoing series on that over here). So far, that book has taught me a lot about the first sentence of the description above.
From Dave’s course, I expected a fair amount of review from the book. But Dave does something I didn’t expect: he starts the course with a study of the second sentence in that description:
What is a machine, and what does it mean to run instructions on it?
The answer to this question usually involves the Von Neumann Architecture diagram:

A machine has:
- A memory unit: for storing large amounts of data
- Registers: for storing a smaller amount of fast-access data
- Arithmetic/Logic Unit: for manipulating the data
- A Control Unit: for orchestrating the manipulations of the data
It’s helpful to see in broad strokes like this, and the internet teems with longer descriptions of each of these pieces for folks who learn well by reading.
But I learn by getting my hands dirty. I want to experiment and break stuff. So I especially appreciated another learning tool that Dave provided: Metal. It’s a rudimentary model of how a machine processes instructions, represented in Python code. Here she is, in all her 100 lines of glory. Being able to run instructions on metal and then use a debugger to follow her through the execution process gave me a much better idea of what happens inside a machine when it receives a series of instructions. I’ll show you some examples later. But first:
A Brief Overview of Metal
Metal has 8 registers for fast-access memory. The registers live in a dictionary, with keys R0-R7. Metal processes several instructions to operate on the data in those registers. She accepts the instructions as tuples. These are the tuples she can execute along with their Python “translation”:
# ('ADD', 'Ra', 'Rb', 'Rd') ; Rd = Ra + Rb
# ('SUB', 'Ra', 'Rb', 'Rd') ; Rd = Ra - Rb
# ('MUL', 'Ra', 'Rb', 'Rd') ; Rd = Ra * Rb
# ('DIV', 'Ra', 'Rb', 'Rd') ; Rd = Ra * Rb
# ('INC', 'Ra') ; Ra = Ra + 1
# ('DEC', 'Ra') ; Ra = Ra - 1
# ('AND', 'Ra', 'Rb', 'Rd') ; Rd = Ra & Rb (bitwise-and)
# ('OR', 'Ra', 'Rb', 'Rd') ; Rd = Ra | Rb (bitwise-or)
# ('XOR', 'Ra', 'Rb', 'Rd') ; Rd = Ra ^ Rb (bitwise-xor)
# ('SHL', 'Ra', nbits, 'Rd') ; Rd = Ra << nbits (left shift)
# ('SHR', 'Ra', nbits, 'Rd') ; Rd = Ra >> nbits (right shift)
# ('CMP', 'op', 'Ra', 'Rb', 'Rd') ; Rd = (Ra op Rb) (compare)
# ('CONST', value, 'Rd') ; Rd = value
# ('LOAD', 'Rs', 'Rd', offset) ; Rd = MEMORY[Rs + offset]
# ('STORE', 'Rs', 'Rd', offset) ; MEMORY[Rd + offset] = Rs
# ('JMP', 'Rd', offset) ; PC = Rd + offset
# ('BZ', 'Rt', offset) ; if Rt == 0: PC = PC + offset
# ('HALT,) ; Halts machine
The instructions look familiar to a Pythonista because they mimic the instructions that Python itself disassembles to:

I’m ready to show you what Metal instructions look like, but first, a note from Dave, its author:
A couple of important things about metal. It’s meant to be a *greatly* simplified model of a CPU. There are many things missing. For example, it doesn’t have any support for overflow/underflow (which makes arithmetic more difficult). Nor any notion of different data types. Probably the key thing—it was solely meant as an instructional device and not some attempt (by me) to emulate an actual CPU.
– Dave Beazley, via Twitter DM
Okay? So don’t go bikeshedding this thing. I promise you right now, no one thinks you’re clever.
Example Instructions for Metal
Suppose we need to compute 3 + 4 – 5. Here’s how we might execute that with Metal:
machine = Metal()
instructions = [
('CONST', 3, 'R1'),
('CONST', 4, 'R2'),
('CONST', 5, 'R3'),
('ADD', 'R1', 'R2', 'R4'),
('SUB', 'R4', 'R3', 'R1'),
('STORE', 'R1', 'R0', IO_OUT),
('HALT',)
]
machine.run(instructions)
*Storing something to IO_OUT prints it. I know, I know. Bear with me.
So what’s happening here? First, we save the constants 3, 4, and 5 to memory addresses where we’ll be able to access them. After that, we refer to each pf these pieces of data by its memory address. We perform two arithmetic operations: one that finds the sum of the values at R1 and R2 and stores it in location R4, and a second one that takes the value at that location and subtracts the value at R3, storing the result in R1. Then we print the value at R1 and stop the program.
Immediately, you’ll notice the delicate dance of where to store the variables and which ones to overwrite. With only eight registers, it quickly becomes critical to make sure I’m not wasting them. That task occupied the majority of my thought as I began asking Metal for more complicated stuff. And Metal simplifies this task from a standard CPU.
Let’s look at another metal program that calculates 5 factorial:
machine = Metal()
instructions = [
('CONST', 6, 'R1'),
('CONST', 1, 'R2'),
('CONST', 1, 'R4'),
('SUB', 'R1', 'R2', 'R1'),
('MUL', 'R1', 'R4', 'R4'),
('CMP', '<=', 'R1', 'R2', 'R3'),
('BZ', 'R3', -4),
('STORE', 'R4', 'R0', IO_OUT),
('HALT',)
]
machine.run(instructions)
I cheated here by initializing the number I’m factorializing to 6 so that I can decrement it once before the operations begin. This example introduces some new commands:
- CMP: compare R1 and R2 with comparator ‘<=’. Store a 1 at R3 if the value of R1 is less than or equal to the value of R2. Otherwise, store a 0 at R3.
- BZ: If the value at R3 is 1, move execution back 4 instructions (to the subtraction line). Otherwise, move on to the next instruction. We can use this to insert branched logic into the code.
This example starts to give you an idea of how control flow would work. Suppose we push that a little further by attempting to express this Python function with Metal instructions:
def fact(n):
result = 1
while n > 0:
result *= n
n -= 1
return result
Now we have a dynamic input and a function to call. We also have an operation to repeat, conditional on the decremented input reaching a predetermined minimum value.
machine = Metal()
instructions = [
('CONST', 1, 'R1'),
('CONST', 10, 'R2'),
('MUL', 'R1', 'R2', 'R3'),
# WHILE LOOP
('CMP', '>', 'R1', 'R2', 'R4'),
('JMP', 'R0', 6), #Go to start of function
('BZ', 'R4', -2), # Go to incrementation
# FACTORIAL CODE
('DEC', 'R2'),
('MUL', 'R3', 'R2', 'R3'),
('STORE', 'R3', 'R0', IO_OUT),
('CMP', '>=', 'R1', 'R2', 'R4'),
('BZ', 'R4', -5),
('HALT',),
#
('JMP', 'R0', 4), # go back to the end of the while loop
('HALT',),
]
machine.run(instructions)
This is where I really start cheating. At ('JMP', 'R0', 6)
, I have identified the index in the instructions where I perform the factorial calculation, and I’ve literally hardcoded it into the while loop to cause Metal to jump to that point. When Metal finishes that part of the instructions, at ('BZ', 'R4', -5)
, it conditionally jumps, again, back a hardcoded number of instructions. This cannot possibly actually be how machine code works: the functions, as series of instructions, must get stored with some kind of index so other parts of the code can refer to them.
I suppose one could store that information in one of the registers. Eight registers really isn’t a lot of slots for storing data, is it? I wondered, as I wrote this code, whether “real” CPUs have many more registers than Metal does. As it turns out, no: “real” CPUs also tend to have eight registers. Clearly I’m not using registers in these examples the way “real” machine code might. Time to find out what I’m doing wrong.
This quorum answer from Sankalp Jain gave me a better idea of what a machine might really store in a register—including a memory address, or a series of instructions represented as an integer, rather than just a lone integer. I need to dig deeper into my understanding of this, but this discovery exemplifies how working with Metal is helping me figure out what questions to ask to further my understanding of what a machine does.
The other thing about working with Metal is this: I cannot recommend highly enough running a series of instructions with the debugger on and a breakpoint set at the point where Metal consumes each individual operation. Here you see that I have set a breakpoint at line 85 and run the instructions for the factorial code, and I can look in the debugger to understand what command I’m on, what’s next, and what my registers look like at a given time.

Shortly in this series, we’ll move on to understanding how a compiler might accept code in a high-level language and make it runnable by “targeting” (generating output that can be run by) a given computer architecture or a given virtual machine. For the sake of a pedagogical example, a compiler could even target Metal itself. (If there’s interest, perhaps I’ll give that a try on a live stream later 😉).
But in the meantime, I appreciate that Dave began this class with exercises to help us understand the end result of the compilation process. I came away with a clear picture of what we ultimately need the compiler to do and a tool for improving my understanding for machine code.
If you liked this piece, you might also like:
The Crafting Interpreters series, which covers what I’m learning from Bob Nystrom’s book
The Raft series, of course! Learn to implement a distributed system from the comfort of your home, complete with gifs and smack talk!
The time I talked about what counts as a programming language in front of an audience of programming language designers. I strategically saved this talk for a pandemic-era Zoom conference to eliminate the possibility that they’d actually throw tomatoes.