Extracting bitmaps from Flash

Flash content is pretty common on the Web, but it is considerably less transparent than traditional HTML-based content. Whereas HTML-based content is built from human-readable script files and URL resource definitions, Flash content is locked inside a binary file format. It is sometimes necessary to convert Flash content into a more managable form. As an example of such a conversion, I discuss the process of extracting all JPEG images from a Flash (SWF) file, and present some Python code which performs the extraction.

Resources

The first step in extracting data from SWF files is finding a good description of their format. Good documentation can be a little tricky to find, but here are three resources I found helpful:

Alexis’ SWF Reference
Musicman’s Flash Experiments (particularly the SWF analyzer)
Adobe’s SWF documentation

With this information in hand, we can turn to the first step in parsing our file.

File Header

Every SWF file begins with an 8-byte file header. The important part of this header is the first character, which identifies the SWF file as either a compressed (“C”) or uncompressed (“F”) SWF. If the file is compressed, we will need to decompress it before proceeding to the next step.

Compressed files consist of the file header followed by a zlib-compressed data block. To decompress them, simply:

Copy the first 8 bytes to a new file
Invoke a zlib decompresser (e.g. Python’s zlib.decompress()) on the rest of the compressed file
Append the output of the zlib decompresser to the new file
Change the first byte of the new file from “C” to “F”

This post goes into a little more detail, and provides sample code.

We can now proceed to extract data from the uncompressed SWF file. (From this point forward, all references to SWF files implicitly refer to uncompressed SWF files.)

Movie Header

Immediately following the file header, we find a movie header. The movie header contains some basic information about the flash file – e.g. it’s width, height, and framerate. More importantly, this header is (in principle) of variable length, and lies between the start of the file and the tags that make up the bulk of the SWF. Therefore, we must parse it in order to skip over it, and get to the tag data we really care about.

(Some sources report that the movie header is always 13 bytes long; if accurate, then this information would be sufficient for our purposes, as we only need to find the location of the first tag in the SWF. However, I prefer to play it safe, and read the movie header properly.)

I will use the following bytestream to illustrate the parsing of a movie header. These bytes were taken from a typical SWF file:

File Offset	Value (Hex)	Value (Binary)
`08`	`78`	`0111 1000`
`09`	`00`	`0000 0000`
`0A`	`05`	`0000 0101`
`0B`	`5F`	`0101 1111`
`0C`	`00`	`0000 0000`
`0D`	`00`	`0000 0000`
`0E`	`0B`	`0000 1011`
`0F`	`B8`	`1011 1000`
`10`	`00`	`0000 0000`
`11`	`00`	`0000 0000`
`12`	`24`	`0010 0100`
`13`	`63`	`0110 0011`
`14`	`00`	`0000 0000`

A movie header is made up of a RECT structure (which is, in turn, comprised of several bit fields) and two byte fields representing unsigned shorts. To read this header, we must take a moment to understand how bit fields and byte fields work in SWF files.

Bit Fields

SWF bit fields are arrays of bits which represent numbers. These bit arrays may be of unusual lengths, e.g. 5 bits or 15 bits, as opposed to the more typical 8/16/32/64 bit lengths. These bit arrays are not aligned to file bytes; they may begin (or end) in the middle of an SWF file byte.

The order in which the bits of a bit field are stored in an SWF file is not obvious: If a bit field spans multiple bytes in the file, the most significant bit(s) of the field will be found in the first byte occupied by the field. Within a byte, the most significant bit of the field will be found in the most significant bit occupied by the field.

To return to the movie header: A RECT structure is made up of 5 bit fields. The first field is a 5-bit unsigned integer, which contains the length of the remaining 4 bit fields, which are N-bit signed integers. (This arrangement makes the RECT structure variable-length, and tricky to skip over.)

In the bytestream given above, the first 9 bytes represent a RECT. The first 5 bits are 01111, equal to 15. The following 4 15 bit signed integers are:

Value (Binary)	Value (Hex)	Value (Decimal)
`000 0000 0000 0000`	`0x0000`	`0`
`010 1010 1111 1000`	`0x2af8`	`11000`
`000 0000 0000 0000`	`0x0000`	`0`
`001 0111 0111 0000`	`0x1770`	`6000`

This RECT has a min_x and min_y of 0, a max_x of 11000, and a max_y of 6000. Since this RECT is from a movie header, these values represent the size of the movie frame in TWIPS, or 1/20ths of a pixel. The movie from which this header was taken therefore has a 550×300 pixel frame.

Byte Fields

SWF byte fields are arrays of bytes which represent numbers. These byte arrays always have typical lengths, e.g. 1, 2, 4, or 8 bytes, and are always aligned with SWF file bytes. If a byte field follows a bit field which ended in the middle of an SWF file byte, the byte field begins with the following SWF file byte. Byte arrays are stored in LSB order.

Continuing our earlier example, consider the balance of the movie header, which occupies bytes 11 through 14. (The last 7 bits in byte 10 are unused due to the alignment considerations just discussed.) The movie header ends with two unsigned shorts, which decode (in this case) to 0x2400 and 0x0063. The first represents the frame rate of the movie in 8.8 fixed point notation, and the latter the number of frames in the movie. This movie is intended to play at 36fps, and has 99 total frames.

JPEGTables

This tag contains the JPEG encoding information shared by all DefineBitsJPEG tags. It includes a beginning-of-stream JPEG marker, which is appropriate, and an end-of-stream marker, which is not. (We’ll eventually glue the data from the JPEGTables tag together with data from DefineBitsJPEG tags, and most applications don’t expect a end-of-stream marker in the middle of a JPEG file.)

When processing this tag, copy all but the last two bytes of its data block to a buffer, and store this buffer for later use.

DefineBitsJPEG

This tag contains a JPEG-encoded bitmap. It does not include the encoding tables, which are stored separately in a JPEGTables tag. This tag’s data begins with a byte field representing an unsigned short; this short is a unique identifier for the bitmap. Following the ID is the JPEG data. The JPEG data includes a beginning-of-stream JPEG marker, which is problematic, since most applications don’t expect a beginning-of-stream marker in the middle of a JPEG file. The JPEG data ends with an end-of-stream marker, which is fine.

When processing this tag, first read the leading short, and store it for use as an identifier for the JPEG. Next, skip the following two bytes of the data block, as they represent the undesirable beginning-of-stream JPEG marker. Finally, concatenate the JPEG tables (found in an earlier JPEGTables tag) and the remainder of the DefineBitsJPEG data block to create a complete JPEG image.

DefineBitsJPEG2

This tag contains a complete JPEG-encoded bitmap. This tag’s data begins with a byte field representing an unsigned short; this short is a unique identifier for the bitmap. Following the ID is the JPEG data. Some documentation indicates that the JPEG data includes an end-of-stream / beginning-of-stream pair immediately following the JPEG tables, but this does not appear to be the case.

When processing this tag, first read the leading short, and store it for use as an identifier for the JPEG. Next, read the remainder of the DefineBitsJPEG2 data block as a complete JPEG image.

DefineBitsJPEG3

This tag contains a complete JPEG-encoded bitmap, and an ancillary zlib-compressed alpha channel. This tag’s data begins with a byte field representing an unsigned short; this short is a unique identifier for the bitmap. Following the ID is a byte field representing an unsigned long; this long stores the number of bytes of JPEG data (as distinct from alpha data) in the tag. Following the length is the JPEG data, which includes an end-of-stream / beginning-of-stream pair immediately following the JPEG tables. (Most applications don’t expect these markers in the middle of a JPEG file, so they’ll have to be removed.) Following the JPEG data is the alpha data. Note that the alpha channel is stored separately from the JPEG image data because JPEG, as a lossy format, is a poor carrier of alpha channel information.

When processing this tag, first read the leading short, and store it for use as an identifier for the JPEG. Next, read the JPEG data length from the unsigned long stored in the next 4 bytes. Then read a number of bytes equal to the JPEG data length as a complete JPEG image. Remove the end-of-stream / beginning-of-stream pair from the image data, and convert the image to a format that supports alpha-channel transparency, e.g. PNG. Pass the remainder of the DefineBitsJPEG3 data block to a zlib decompressor, and store the resulting bytestream to the alpha channel of the converted image.

DefineBitsLossless(2)

The DefineBitsLossless and DefineBitsLossless2 tags are additional bitmap-encoding tags; I’m not covering their extraction here, because:

They’re sort of ugly
I don’t have good test cases handy for all 6 variations of these tags

I mention them only for completeness.

Code

The dump_jpegs() function in this code will, given a pathname to an SWF, dump all JPEGs within that SWF to the SWF’s directory. Each output file will be named according to the unique ID found in the JPEG’s tag.

Since WordPress mangles quotes (and other characters, seemingly at random) something horrible, I’m also making the code available as a download here.

# Dump all JPEG tags from an SWF file
import  os
import  zlib
import  struct
import  StringIO

import  Image

# Helpers for reading SWF files
def CalcMaskShift(pos, len):
    shift = pos - len + 1
    return (pow(2, len) - 1) << shift, shift

class BitStream(object):
    lut = dict(((pos, len), CalcMaskShift(pos, len)) for pos in range(8) for len in range(1, pos+2))

    def __init__(self, fp):
        self.fp = fp
        self.next()

    def next(self):
        c = self.fp.read(1)
        if (c): self.curr_byte = ord(c)
        else: self.curr_byte = None
        self.bit_pos = 7

    def tell(self):
        return (self.fp.tell()-1, self.bit_pos)

    def seek(self, curr_byte, bit_pos=7):
        self.fp.seek(curr_byte)
        self.next()
        self.bit_pos = bit_pos

    def align(self):
        if (self.bit_pos != 7): self.next()

    def make_signed(self, val, size):
        flag = pow(2, size-1)
        if (val >= flag):
            return val - flag - flag
        else:
            return val

    # Bit order is preserved; msb stored in msb
    def read_partial_byte(self, size):
        mask, shift = self.lut[(self.bit_pos, size)]
        rv = (self.curr_byte & mask) >> shift
        self.bit_pos = (self.bit_pos - size) & 0x7
        if (self.bit_pos == 7): self.next()
        return rv

    # Bitfields are stored in MSB order
    def read_bits(self, size, signedp):
        # Segment read
        head_len = min(size, self.bit_pos+1)
        body_len = (size - head_len) / 8
        tail_len = (size - head_len) & 0x7
        # Perform read
        rv = 0
        if (head_len):
            rv = self.read_partial_byte(head_len)
        while (body_len):
            rv = rv * 256 + self.curr_byte
            self.next(); body_len -= 1
        if (tail_len):
            rv = (rv << tail_len) + self.read_partial_byte(tail_len)
        if (signedp):
            rv = self.make_signed(rv, size)
        return rv

    # Byte values are stored in LSB order
    def read_bytes(self, size, signedp):
        self.align(); rv = 0; factor = 1; nbits = size*8
        while (size):
            rv += (self.curr_byte*factor)
            self.next(); factor *= 256; size -= 1
        if (signedp):
            rv = self.make_signed(rv, nbits)
        return rv

    def read_raw(self, size):
        self.align()
        data = chr(self.curr_byte) + self.fp.read(size-1)
        self.next()
        return data

# Low-level extraction code
def skip_bytes(bs, size):
    byte_pos, bit_pos = bs.tell()
    bs.seek(byte_pos+size)

def read_rect(bs):
    nbits   = bs.read_bits(5, False)
    xmin    = bs.read_bits(nbits, True)
    xmax    = bs.read_bits(nbits, True)
    ymin    = bs.read_bits(nbits, True)
    ymax    = bs.read_bits(nbits, True)
    return (xmin, xmax, ymin, ymax)

def read_movie_header(bs):
    rect    = read_rect(bs)
    fps     = bs.read_bytes(2, False)/256.0
    nframes = bs.read_bytes(2, False)
    return ([n/20.0 for n in rect], fps, nframes)

def read_jpeg_table(bs, length):
    # Strip the trailing end-of-stream
    rv = bs.read_raw(length-2)
    skip_bytes(bs, 2)
    return rv

def read_jpeg_bits(bs, length, table):
    # Get the bitmap ID
    id = bs.read_bytes(2, False)
    # Omit the opening beginning-of-stream
    skip_bytes(bs, 2)
    # Return a complete JPEG
    return (id, table+bs.read_raw(length-4))

def read_jpeg_bits_2(bs, length):
    # Contrary to documentation, this appears to consist of only a single
    # JPEG stream - there is no FF D9 FF D8 quad in the datastream
    id = bs.read_bytes(2, False)
    return (id, bs.read_raw(length-2))

def read_jpeg_bits_3(bs, length):
    # Most apps don't like SWF's two-stream-per-file business, so this
    # crudely strips out the end-of-stream / start-of-stream tag pair.
    # A little risky, but there's only a 2**-32 chance of it occuring randomly
    id          = bs.read_bytes(2, False)
    jpg_len     = bs.read_bytes(4, False)
    img_data    = bs.read_raw(jpg_len).replace('\xff\xd9\xff\xd8', '')
    alpha_data  = zlib.decompress(bs.read_raw(length-6-jpg_len))
    return (id, img_data, alpha_data)

# Extraction utility fxn
def dump_jpegs(fn):
    pn = os.path.split(fn)[0]
    fp = file(fn, 'rb')

    sig, ver, length = struct.unpack('<3sBL', fp.read(8))
    if (sig == 'CWS'):
        bs = BitStream(StringIO.StringIO(zlib.decompress(fp.read())))
    elif (sig == 'FWS'):
        bs = BitStream(fp)
    else:
        return

    rect, fps, nframes = read_movie_header(bs)
    print 'sig:        ' + sig
    print 'version:    %d' % ver
    print 'length:     %d' % length
    print 'screen:     %.1fx%.1f' % (rect[1]-rect[0], rect[3]-rect[2])
    print 'fps:        %.1f' % fps
    print 'num frames: %d' % nframes

    table = None
    while (1):
        # Read tag header
        code    = bs.read_bytes(2, False)
        tag     = code >> 6
        length  = code & 0x3f
        if (length == 63):
            length = bs.read_bytes(4, False)
        # Process JPEG tags, or skip
        if (tag == 0):
            break
        elif (tag == 8):
            table = read_jpeg_table(bs, length)
        elif (tag == 6):
            id, bits = read_jpeg_bits(bs, length, table)
            file(os.path.join(pn, '%d.jpg'%id), 'wb').write(bits)
        elif (tag == 21):
            id, bits = read_jpeg_bits_2(bs, length)
            file(os.path.join(pn, '%d.jpg'%id), 'wb').write(bits)
        elif (tag == 35):
            id, img_bits, alpha_bits = read_jpeg_bits_3(bs, length)
            img = Image.open(StringIO.StringIO(img_bits)).convert('RGBA')
            img.putalpha(Image.fromstring('L', img.size, alpha_bits))
            img.save(os.path.join(pn, '%d.png'%id))
        else:
            skip_bytes(bs, length)

Share and Enjoy:

Resources

File Header

Movie Header

Tags

JPEGTables

DefineBitsJPEG

DefineBitsJPEG2

DefineBitsJPEG3

DefineBitsLossless(2)

Code

Services

Find Stuff

Pages

Buy My Apps

Other Stuff I’ve Built

Book Club

Archives

Categories

Blogroll