Generating HTML from Python

Occasionally I will write scripts which generate HTML documents. I understand the view that HTML is “object code”, and that its formatting doesn’t matter, but I’ve never been completely able to adopt that position. As a result, I usually try to emit “pretty-printed” HTML, and have developed a few small classes to assist in the effort.

Code

Here are the utility classes I use; since WordPress mangles code, they are also available as a download:

class HtmlTag(object):
    endless = set(['link', 'input'])
    def __init__(self, tag, *children, **kw_args):
        self.tag        = tag
        self.children   = children
        self.attributes = kw_args.get('attributes', {})
    def to_html(self, c_indent="", i_indent='\t'):
        if (self.attributes):
            attrs = ' '.join('%s="%s"'%(k,v) for k,v in self.attributes.items())
            rv  = ['%s<%s %s>' % (c_indent, self.tag, attrs)]
        else:
            rv  = ['%s<%s>' % (c_indent, self.tag)]
        for c in self.children: rv.extend(c.to_html(c_indent+i_indent, i_indent))
        if (not self.tag.lower() in self.endless): rv.append('%s</%s>' % (c_indent, self.tag))
        return rv

class HtmlLineTag(HtmlTag):
    def to_html(self, c_indent="", i_indent='\t'):
        return [c_indent + ''.join(HtmlTag.to_html(self, '', ''))]

class HtmlText(object):
    def __init__(self, text):
        self.text = text
    def to_html(self, c_indent="", i_indent='\t'):
        lines = self.text.split('\n')
        return [c_indent+l+' ' for l in lines[:-1]] + [c_indent+lines[-1]]

A few qualifiers:

  • These classes are designed to emit HTML 4.01, not XHTML
  • These classes don’t support DOCTYPE declarations, those must be added in another way
  • HtmlTag.endless is *not* an exhaustive list of endless HTML elements (i.e. those without end tags); add your own as needed
  • There might be a better, built-in way to do this in Python, but this was fast enough to write that I’m too lazy to look; if you know of one, feel free to tell me

Examples

Defining and printing a simple <P> tag:

>>> t = HtmlTag('P', HtmlText('This is a test'))
>>> print '\n'.join(t.to_html())
<P>
    This is a test
</P>

Some simple tags you don’t want to place on their own line:

>>> a = HtmlLineTag('A', HtmlText('click here'), attributes={'href':"http://www.example.com"})
>>> t = HtmlTag('P', a)
>>> print '\n'.join(t.to_html())
<P>
    <A href="http://www.example.com">click here</A>
</P>

You’ll usually want to apply some initial intenting, to fit into an existing HTML BODY tag:

>>> print '\n'.join(t.to_html('\t\t'))
        <P>
            <A href="http://www.example.com">click here</A>
        </P>

Subclasses are easy to write:

class Cell(HtmlLineTag):
    def __init__(self, text):
        self.tag        = 'TD'
        self.children   = [HtmlText(text)]
        self.attributes = {}

class Row(HtmlTag):
    def __init__(self, items):
        self.tag         = 'TR'
        self.children    = map(Cell, items)
        self.attributes  = {}

They can result in simpler, more legible generation code:

>>> rows = ['Top', 'Center', 'Bottom']
>>> cols = ['Left', 'Middle', 'Right']
>>> t = HtmlTag('TABLE', *[Row(['%s %s'%(c,r) for c in cols]) for r in rows])
>>> print '\n'.join(t.to_html('\t\t'))
        <TABLE>
            <TR>
                <TD>Left Top</TD>
                <TD>Middle Top</TD>
                <TD>Right Top</TD>
            </TR>
            <TR>
                <TD>Left Center</TD>
                <TD>Middle Center</TD>
                <TD>Right Center</TD>
            </TR>
            <TR>
                <TD>Left Bottom</TD>
                <TD>Middle Bottom</TD>
                <TD>Right Bottom</TD>
            </TR>
        </TABLE>

None of this is rocket science, but it makes me happy.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Planet Microsoft, Python, UNIX, Web stuff. Bookmark the permalink.

Comments are closed.