How I Made It - Francis' Assembler


Click here to go back to the demo

Project Highlights

How It Works

My assembler takes text representing ARM64 instructions, labels and several assembler directives and produces a Mach object file which can be passed to a linker to generate a working executable.

The first stage of my assembler closely resembles that of my Lisp compiler which you can read about here.

The most interesting part of my assembler is the code for assembling instructions and for generating Mach-O files.

Assembling ARM64 Instructions

ARM64 instructions have a fixed width 32 bits meaning the instruction and its operands are all encoded inside 32 bits.

The architecture of my assembler actually makes implementing new instructions easy, each instruction is its own Java class implementing an interface called 'Instruction'.

public interface Instruction {
    byte[] compile();
}

The Instruction interface

The Instruction interface is simple, it contains a single method called compile() which returns the compiled/assembled instruction. Shown below is the implementation of the compile() method for the 'svc' instruction, 'svc' stands for 'Supervisor Call' and is used to execute system calls on ARM64.

The implementation allocates a byte array of 4 bytes and initialises the bits integer with the bit pattern representing the 'svc' instruction. I use bit-shifting to add the instruction operand.

@Override
public byte[] compile() {
    ByteBuffer bb = ByteBuffer.allocate(4);
    bb.order(ByteOrder.LITTLE_ENDIAN);
    // 11010100_000xxxxx_xxxxxxxx_xxx00001
    int bits = 0b11010100_00000000_00000000_00000001;
    bits |= (param << 5);
    bb.putInt(bits);
    return bb.array();
}

Implementation of 'compile' for the 'svc' instruction

All ARM64 instructions are 32 bits wide, for efficiency the compile() method should really be returning 32-bit integers rather than byte arrays. I decided to return byte arrays because I intend to support x86 instructions in the future and x86 instruction sizes are not fixed, x86 instructions can be up to 15 bytes in length.

One nice consequence of having one class per instruction means each class's toString() method can be used to print the instruction in its disassembled textual form.

@Override
public String toString() {
    return String.format("svc %d", param);
}

toString() implementation for the SvcInstruction class