File and Memory Offsets with DOS DEBUG

I often use the DOS DEBUG command to investigate older 16-bit programs; while there are doubtless better tools, DEBUG is ubiquitous and effective. There are many guides to its operation, but here I focus on a specific and practical question: How does one translate between file offsets and memory addresses?

Motivation

It is sometimes necessary to know the memory address to which a particular part of an executable was loaded. (Or, conversely, the file offset from which a particular region of memory was loaded.) For instance, if one wishes to permanently alter a program’s opcodes, one must know where those opcodes are located in the executable, and not merely where they reside in memory.

Segmented Memory

DOS DEBUG works with a 16-bit segmented memory model. Under this model addressable memory is limited to 1M bytes, from 0x00000 to 0xFFFFF. Memory addresses are represented by Segment:Offset pairs, where the 16-bit Segment is drawn from a Segment Register (e.g. CS, SS, DS, or ES) and the 16-bit Offset is drawn from another register (e.g. BX), an immediate, or some combination thereof.

A physical memory address is computed from a segmented address with the following formula:

  • Physical_Address = Segment*16 + Offset

The most immediately relevant consequence of this is that a single physical (or, if running on virtualized hardware, “physical”) memory location can be addressed with many (approximately 4K) different Segment:Offset pairs. For instance, all these Segment:Offset pairs refer to the same physical address (i.e. 0x253A8):

  • 1CB2:8888
  • 1CB0:88A8
  • 1CB4:8868
  • 1DC3:7778
  • 1BA1:9998
  • 253A:0008
  • 153B:FFF8

Therefore, when discussing the memory address to which a particular file offset maps, we are concerned with *physical* locations, which are only incidentally represented by Segment:Offset pairs.

The MZ Header

All DOS executables begin with an MZ header. For our purposes, we are interested in two values:

Value Type File Offset
Header size in paragraphs unsigned short 0x08
Initial relative CS unsigned short 0x16

The first value represents the executable’s header length as a number of 16-byte ‘paragraphs’. The second value is discussed in the next section.

Executable Loading

When DOS loads an executable into memory, it performs two actions of immediate interest to us:

  • Copy a contiguous area of the executable to a contiguous area of memory
  • Initialize registers, particularly the CS register

When DOS copies the executable, it first skips a number of header bytes (as indicated by the MZ header) and then (normally) continues reading until the end of the file. DOS begins writing at a “segment-aligned” address, which is a physical address representable by a segmented address with a zero Offset (e.g. 3246:0000, or physical address 0x32460).

During register initialization, the CS register is computed based upon the memory address to which DOS began writing the executable, and a value (the “Initial relative CS”) taken from the MZ header. Specifically, the initial CS value is computed according to this formula:

  • Initial_CS = Physical_Start_Of_Executable/16 + Initial_Relative_CS

These facts allow us to translate between file offsets and memory addresses.

Methodology

Since an executable file is read into memory as one contiguous chunk, we really only need one number – which we may call a Translator – to map file offsets to physical memory addresses, and vice-versa. This number will be employed in these formulas:

  • File_Offset = Physical_Address – Translator
  • Physical_Address = File_Offset + Translator

Given these formulas, it’s obvious that the Translator can be computed from the difference between the physical address to which DOS began writing the executable, and the file offset from which DOS began reading it.

Putting together all the preceding facts, the formula for the Translator is:

  • Translator = (Initial_CS – Initial_Relative_CS)*16 – Header_Size*16

Where “Initial_CS” is taken from the output of DOS DEBUG’s “r” command immediately after the executable is loaded, and “Header_Size” is the raw paragraph count from the executable’s MZ header.

Code

This Python code captures the preceding discussion: Use the ComputeTranslator() function to, well, compute a translator for an executable (you must supply the “initialCS” value by running DOS DEBUG), and then use the MemoryToOffset() and OffsetToMemory() functions to translate between file offsets and memory addresses.

# File Offset <-> Memory address computer
import  struct

# Pull relevant values from an executable's MZ header
def ScanMzValues(fn):
    return struct.unpack('<8xH12xH', file(fn, 'rb').read(24))

# Display some values from an executable's MZ header
def FmtMzValues(hLen, csOff):
    return 'Header length: 16*%-3d, CS Offset: %04x' % (hLen, csOff)

# Given an initial CS value and an executable's filename, compute the Translator value
def ComputeTranslator(fn, initialCS):
    hLen, csOff = ScanMzValues(fn)
    return (initialCS - csOff - hLen)*16

# Given a physical address, format it as a legal Segment:Offset pair, if possible
# Since a physical address may be represented by many Segment:Offset pairs,
# a value must be supplied for the segment
def PhysicalToSegmented(addy, segment):
    addy = int(addy); segment = int(segment)
    offset = addy-segment*16
    if ((offset >= 0)       and
        (offset <= 0xffff)  and
        (segment >= 0)      and
        (segment <= 0xffff)):
        return '%04X:%04X' % (segment, offset)
    else:
        return 'Invalid address/segment pairing'

# Given a segmented memory address and a translator, compute a file offset
def MemoryToOffset(segment, offset, translator):
    return segment*16 + offset - translator

# Given a file offset and a translator, compute a memory address
# If a segment is given, attempt conversion to a segmented memory address
def OffsetToMemory(offset, translator, segment=None):
    if (segment):
        return PhysicalToSegmented(offset + translator, segment)
    else:
        return offset + translator
Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Reverse Engineering. Bookmark the permalink.