JSON Data Formats (Bloxorz)

After writing last week’s post on a Bloxorz solution algorithm, I was inspired to revisit the solver to make it a little more useful. In addition to cleaning up a few minor bugs, I preloaded it with all 33 levels of the original puzzle, so that users would no longer have to input a level before finding its solution. I wasn’t entirely happy with the data format I ended up using for the levels; my mistakes may be instructive.

Requirements

My primary requirement for the data format was that it be easily inspected and edited with available UNIX tools. I planned to lay out levels using the solver’s graphical UI, but I wanted to be able to manually verify that the layout process was creating the intended data structures. I also wanted the ability to make changes without using the UI; this would allow me to ignore many corner-cases in the graphical layout tool, since I could always edit the map directly.

My secondary requirement was that the data structure be simple, easy to understand, and have straightforward load/save code.

Solution

I settled on JSON for the representation of the level data structures. Since the solver was written in JavaScript, JSON could be effortlessly parsed into objects. JSON, being text-based, could also be handily inspected and edited with Emacs, meeting my primary requirement.

Perhaps thinking of Hacker’s map data, I defined the following (nested) data structures:

  • A level is an object, containing (along with some administrative data) a map
  • A map is an array of cells; cells are sorted on (row, column) where the upper-left-hand corner is (0, 0)
  • A cell is a 2-character string; the first character representing its type, and the second any special properties of the cell

Here is a representative snippet of the level data file:

{name: 'Level 01',
 data: ['.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', 'N0', 'N4', 'N0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', 'N0', 'N0', '#1', 'N0', 'N0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
	'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',]}

Note that newlines and whitespace are inserted into the JSON representation so that the on-screen arrangement of strings mirrors the layout of the cells those strings represent.

Problems

Although this data format met my requirements, it had one major problem: It was verbose. Six bytes were devoted to the representation of each cell – the two string characters, two quotes, a comma, and a space. In fact, only six bits are really required to represent a cell. (The cell type can be represented in 3 bits, since there are only 7 possible types, and only one bit is needed for each of the 3 possible special properties a cell can have.) This means that my data format is approximately 8 times bigger than it needs to be.

Alternatives

One simple solution is to cut out the spaces between strings. This will shrink the map data by ~17%, but (in my opinion) at the expense of some readibility:

{name: 'Level 01',
 data: ['.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','N0','N0','N0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','N0','N4','N0','N0','N0','N0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','N0','N0','N0','N0','N0','N0','N0','N0','N0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','N0','N0','N0','N0','N0','N0','N0','N0','N0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','N0','N0','#1','N0','N0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','N0','N0','N0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
	'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',]}

Another approach is to make the data structures slightly more complex, redefining them such that:

  • A map is a 1-D array of rows, stored from top to bottom
  • A row is a space-delimited string of cells, stored from left to right
  • Each cell is stored as a 2-character pair, as above
{name: 'Level 01',
 data: ['.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 N0 N0 N0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 N0 N4 N0 N0 N0 N0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 N0 N0 N0 N0 N0 N0 N0 N0 N0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 N0 N0 N0 N0 N0 N0 N0 N0 N0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 N0 N0 #1 N0 N0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 N0 N0 N0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
	'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',]}

This approach will shrink the map data by ~45%, at no real cost to complexity or legibility. On balance, I wish I’d gone this way.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Web stuff. Bookmark the permalink.

Comments are closed.