GCHQ VM

So, Britain’s GCHQ (the UK’s equivalent of the NSA) has put up a programming contest to attract coders. (The punchline is that the gig only pays $39,000, so good luck with that, chaps.) There are several parts, and the only one that appealed to me required the user to implement a little VM. Since I’m a sucker for simple machines, I thought I’d give it a go. (The original puzzle is here.)

Problem Statement

Cutting to the chase, here’s the description of the VM:

    // virtual machine architecture
    // ++++++++++++++++++++++++++++
    //
    // segmented memory model with 16-byte segment size (notation seg:offset)
    //
    // 4 general-purpose registers (r0-r3)
    // 2 segment registers (cs, ds equiv. to r4, r5)
    // 1 flags register (fl)
    //
    // instruction encoding
    // ++++++++++++++++++++
    //
    //           byte 1               byte 2 (optional)
    // bits      [ 7 6 5 4 3 2 1 0 ]  [ 7 6 5 4 3 2 1 0 ]
    // opcode      - - -             
    // mod               -           
    // operand1            - - - -
    // operand2                         - - - - - - - -
    //
    // operand1 is always a register index
    // operand2 is optional, depending upon the instruction set specified below
    // the value of mod alters the meaning of any operand2
    //   0: operand2 = reg ix
    //   1: operand2 = fixed immediate value or target segment (depending on instruction)
    //
    // instruction set
    // +++++++++++++++
    // 
    // Notes:
    //   * r1, r2 => operand 1 is register 1, operand 2 is register 2
    //   * movr r1, r2 => move contents of register r2 into register r1
    // 
    // opcode | instruction | operands (mod 0) | operands (mod 1)
    // -------+-------------+------------------+-----------------
    // 0x00   | jmp         | r1               | r2:r1
    // 0x01   | movr        | r1, r2           | rx,   imm 
    // 0x02   | movm        | r1, [ds:r2]      | [ds:r1], r2
    // 0x03   | add         | r1, r2           | r1,   imm
    // 0x04   | xor         | r1, r2           | r1,   imm 
    // 0x05   | cmp         | r1, r2           | r1,   imm 
    // 0x06   | jmpe        | r1               | r2:r1
    // 0x07   | hlt         | N/A              | N/A
    //
    // flags
    // +++++
    // 
    // cmp r1, r2 instruction results in:
    //   r1 == r2 => fl = 0
    //   r1 < r2  => fl = 0xff
    //   r1 > r2  => fl = 1
    // 
    // jmpe r1
    //   => if (fl == 0) jmp r1
    //      else nop

(Note that the complete problem statement also provides a JS definition of the VM, less its exec method, the implementation of which is the point of the exercise. We’ll be taking this definition as a given for the rest of this post, so please consult the original puzzle for details.)

Helpers

Since we’ll be going between opcode and JS representations of memory locations and registers, let’s build some util fxns to make that easier. (I also threw in some convenience vars and a fxn to grab a byte from and increment the IP.)

  exec: function()
  {
    // Convenience
    var cpu = this.cpu;
    var mem = this.mem;
    
    // Helpers
    function read_memory(s, o)
    {
        return mem[s*16 + o];
    }
    function write_memory(s, o, v)
    {
        mem[s*16 + o] = v;
    }
    function read_register(n)
    {
        switch(n)
        {
            case 0:
                return cpu.r0;
            case 1:
                return cpu.r1;
            case 2:
                return cpu.r2;
            case 3:
                return cpu.r3;
            case 4:
                // Equivalence from "docs"
                return cpu.cs;
            case 5:
                // Equivalence from "docs"
                return cpu.ds;
            default:
                throw "Unknown register index: " + n;
                break;
        }
    }
    function write_register(n, v)
    {
        switch(n)
        {
            case 0:
                cpu.r0 = v;
                break;
            case 1:
                cpu.r1 = v;
                break;
            case 2:
                cpu.r2 = v;
                break;
            case 3:
                cpu.r3 = v;
                break;
            case 4:
                // Equivalence from "docs"
                cpu.cs = v;
                break;
            case 5:
                // Equivalence from "docs"
                cpu.ds = v;
                break;
            default:
                throw "Unknown register index: " + n;
                break;
        }
    }
    function fetch_from_ip()
    {
        var r = read_memory(cpu.cs, cpu.ip);
        cpu.ip += 1;
        return r;
    }

    // ... more stuff ...
  }

Op Implementations

With that done, we can write some implementation code for the JMP/JMPE, ADD, XOR, and CMP operations. (MOV and HLT ops don’t require any logic beyond the helpers we just defined, since we’re not emitting any diagnostic information.)

  exec: function()
  {
    // ... helpers ...

    // Op implementations
    function jmp(ix, seg, go)
    {
        if (!go) return;
        cpu.cs = seg;
        cpu.ip = read_register(ix);
    }
    function add(ix, v)
    {
        write_register(ix, read_register(ix)+v);
    }
    function xor(ix, v)
    {
        write_register(ix, read_register(ix)^v);
    }
    function cmp(ix, v1)
    {
        var v0 = read_register(ix);
        if (v0 == v1)
            cpu.fl = 0;
        else if (v0 < v1)
            cpu.fl = 0xff;
        else
            cpu.fl = 1;
    }

    // ... more stuff ...
  }

Single-Step Interpreter and Loop

We can now implement the core VM logic, and a simple loop to invoke it until we hit a HLT. The only thing to note here is how much ambiguity there is in the VM specification: Are JMPs absolute, or relative? Does the 2nd argument to a FAR JMP evaluate to an immediate CS value, or a register index that should be dereferenced? Where is the result of an ADD or XOR operation stored? As it happened, my first guesses at the right answers to all such questions appeared to have been correct, but experiment was the only way to determine that.

  exec: function()
  {
    // ... helpers ...

    // ... op implementations ...

    // Single-stepper
    function run_next_op()
    {
        var b       = fetch_from_ip();
        var opcode  = (b & 0xe0) >> 5;
        var mod     = (b & 0x10) >> 4;
        var ix      = (b & 0x0f);
        
        switch (opcode)
        {
            case 0:
                jmp(ix, mod==0?cpu.cs:fetch_from_ip(), true);
                return true;
            case 1:
                write_register(ix, mod==0?read_register(fetch_from_ip()):fetch_from_ip());
                return true;
            case 2:
                if (mod == 0)
                    write_register(ix, read_memory(cpu.ds, read_register(fetch_from_ip())));
                else
                    write_memory(cpu.ds, read_register(ix), read_register(fetch_from_ip()));
                return true;
            case 3:
                add(ix, mod==0?read_register(fetch_from_ip()):fetch_from_ip());
                return true;
            case 4:
                xor(ix, mod==0?read_register(fetch_from_ip()):fetch_from_ip());
                return true;
            case 5:
                cmp(ix, mod==0?read_register(fetch_from_ip()):fetch_from_ip());
                return true;
            case 6:
                jmp(ix, mod==0?cpu.cs:fetch_from_ip(), cpu.fl==0);
                return true;
            case 7:
                return false;
        }
    }

    // Run until HLT
    while (run_next_op());
  }

The Answer

After the VM halts, it’s not immediately clear what the answer to the puzzle is. However, if you dump (the 768 bytes of) memory, interpret it as ASCII coded text, and eyeball it for meaningful strings, an HTTP request pops out at you.

I tweaked the JS code to run in Rhino, and added a dump statement to the bottom of VM.exec():

print(mem);

I captured this dump into a file (the -jar js.jar business is a Rhino-ism):

java -jar js.jar ~/Desktop/ukvm/src.js > mem.txt

Next, a little Python (bolding added) revealed the answer:

>>> ''.join(chr(int(e)) for e in file('mem.txt', 'rb').read().split(','))
'1\x043\xaa@\x02\x80\x03R\x00r\x01s\x01\xb2P0\x14\xc0\x01\x80\x00\x10\x10
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002\x00u
\x0c1\x0832@\x02\x80\x03R\x00r\x01s\x03\xb2\x00\xc3\xb0\x000\x1b\xc0\x01
\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00u\x10\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\xcc\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}\x1f\x15
`MMR}\x0e\'m\x10mZ\x06VG\x14B\x0e\xb6\xb2\xb2\xe6\xeb\xb4\x83\x8e\xd7\xe5
\xd4\xd9\xc3\xf0\x80\x95\xf1\x82\x82\x9a\xbd\x95\xa4\x8d\x9a+0iJieU\x1c{i
\x1cn\x04t5!&/`\x03N7\x1e3T9\xe6\xba\xb4\xa2\xad\xa4\xc5\x95\xc8\xc1\xe4
\x8a\xec\xe7\x92\x8b\xe8\x81\xf0\xad\x98\xa4\xd0\xc0\x8d\xac"Re~\'+Z\x12a
\n\x01zk\x1dgGET /da75370fe15c4148bd4ceec861fbdaa5.exe HTTP/1.0\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x007z\x07\x11\x1f\x1dh%2w
\x1eb#[GUS0\x11B\xf6\xf1\xb1\xe6\xc3\xcc\xf8\xc5\xe4\xcc\xc0\xd3\x85\xfd
\x9a\xe3\xe6\x81\xb5\xbb\xd7\xcd\x87\xa3\xd3k6oofU0\x16E^\tt\\?)+f=\r\x02
0(5\x15\t\x15\xdd\xec\xb8\xe2\xfb\xd8\xcb\xd8\xd1\x8b\xd5\x82\xd9\x9a\xf1
\x92\xab\xe8\xa6\xd6\xd0\x8c\xaa\xd2\x94\xcfEFg }D\x14kEmT\x03\x17`bUZJfa
\x11Whu\x05b6}\x02\x10K\x08"B2\xba\xe2\xb9\xe2\xd6\xb9\xff\xc3\xe9\x8a
\x8f\xc1\x8f\xe1\xb8\xa4\x96\xf1\x8f\x81\xb1\x8d\x89\xcc\xd4xvar>7#Vsqyc|
\x08\x11 iz\x14h\x05!\x1e2\'Y\xb7\xcf\xab\xdd\xd5\xcc\x97\x93\xf2\xe7\xc0
\xeb\xff\xe9\xa3\xbf\xa1\xab\x8b\xbb\x9e\x9e\x8c\xa0\xc1\x9bZ//NN\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00'

Conclusion

This code is somewhat over-terse; in particular, there’s no good place to add comprehensive logging or diagnostics to it as written. I’d first written a more verbose implementation with plenty of places for those things, but didn’t end up needing them. You can download my actual JS solution here.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Reverse Engineering. Bookmark the permalink.

Comments are closed.