Archive for January, 2009

Generating HTML from Python

Friday, January 30th, 2009

Occasionally I will write scripts which generate HTML documents. I understand the view that HTML is “object code”, and that its formatting doesn’t matter, but I’ve never been completely able to adopt that position. As a result, I usually try to emit “pretty-printed” HTML, and have developed a few small classes to assist in the effort.

(more…)

Disassembler (Odds and Ends)

Friday, January 23rd, 2009

This week, I post some remarks following up on the just-concluded disassembler tutorial; there are always a few loose ends to tie up, and I wanted to clarify or expand on:

  • The opcode map
  • Addressing
  • Demo code
  • Hardening

(more…)

Disassembler (Part 3)

Friday, January 16th, 2009

Editorial Note: Over the last two weeks, we’ve done the groundwork for building a disassembler: We’ve seen how to find documentation for a machine’s instruction format, how to read machine code by hand, and how to build up a machine-readable opcode map. Now it’s time to build the disassembler itself.

This article will walk through the process of building a disassembler for 8086 integer instructions in Python. It takes a “bottom-up” approach, beginning with low-level data structures, and ending with a method that disassembles an arbitrary sequence of bytes.

(more…)

Disassembler (Part 2)

Friday, January 9th, 2009

Editorial Note: This article is the second in a three part series on writing an 8086 disassembler. Today we’ll cover the practical issues involved in finding an opcode map; we saw last week that such a map is central to the process of disassembly. Next week, we’ll use this map (and Python!) to build a disassembler for 8086 integer instructions.

At first, it seems pretty easy to find an opcode map for an 8086 processor: just consult Intel’s documentation. Unfortunately, there are two problems with this approach. First, and most importantly, the quality of the published maps is somewhat poor. The second problem is that the 8086 is a very old (c. 1978) chip, and documentation dedicated to it (as opposed to later members of its family) is not easy to come by. Both problems can be overcome by consulting multiple resources.

(more…)

Disassembler (Part 1)

Friday, January 2nd, 2009

Editorial Note: This article is the first in a three part series on writing a disassembler. Today we’ll cover the high-level concepts involved in disassembly and see how to read machine code “by hand”. Next week, we’ll look at the issues involved in finding and/or constructing an opcode map. Finally, in week 3, we’ll build a disassembler for 8086 integer instructions, using Python.

Over the last few weeks, we’ve seen how to use a debugger (such as DOS DEBUG) to find and examine interesting portions of an executable’s machine code. Now it’s time to begin considering how the debugger performs some of its tricks. I don’t want to launch into a full discussion of debugger programming (yet!), but I do want to talk a little about how to produce an assembly-language view of an executable’s instructions.

(more…)