Python Performance Tuning (con’t)

Let’s wrap up our hunt for performance improvements in our KenKen puzzle solver. This sort of thing can go on nearly endlessly, of course, but after today I think we’ll have gotten most of the easy stuff, and seen dramatic gains in speed for our trouble.

Recap

After our last changes, the top few items in the hotshot stats look like this:

         28901 function calls (28655 primitive calls) in 771.505 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      692  183.663    0.265  464.785    0.672 neknek_1.py:54(apply)
       44  180.457    4.101  749.238   17.028 neknek_1.py:128(constrain)
     1027  103.779    0.101  103.779    0.101 neknek_1.py:94(apply)
     2721   61.978    0.023   61.978    0.023 neknek_1.py:55()
     2643   55.001    0.021  191.786    0.073 neknek_1.py:58()

	... clipped ...

Analysis

To make sense of this, it’s necessary to have to code to hand; the version timed by this code is available here. The execution time is now spread out a bit more between functions; we’re no longer seeing the major hotspots we’ve seen over the last two days. The majority of the execution time is now spent in three functions:

Constraint.apply()

class Constraint(object):
	# ... snip ...
	def apply(self, solution):
		d_sets = dict((c, tuple(map(int, solution[c]))) for c in self.cells); d_bad = {}
		for k,v in d_sets.items():
			others = tuple(ov for ok,ov in d_sets.items() if ok != k)
			d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))
		return d_bad

constrain()

def constrain(solution, *constraints):
	queue = set(constraints)
	while (queue):
		constraint = queue.pop()
		for cell, bad_choices in constraint.apply(solution).items():
			values = solution[cell]
			for choice in bad_choices:
				values = values.replace(choice, '')
			if (not values):
				return False
			if (solution[cell] == values):
				continue
			solution[cell] = values
			queue.update(d_constraints[cell])
	return solution

Set.apply()

class Set(Constraint):
	def apply(self, solution):
		# For each cell:
		d_bad = {}
		for c in self.cells:
			# If a cell has only one possible value, remove that value from all other cells
			if (len(solution[c]) != 1): continue
			for c2 in self.cells:
				if (c2 != c): d_bad[c2] = d_bad.get(c2,'') + solution[c]
		return d_bad

Looking at this code, we can make several observations:

It’s probably not very efficient for Constraints to return dictionaries of items to remove from the solution. We might try returning lists of (cell ID, cell possibilities) pairs instead, in which the possibilities returned are all those not precluded by the constraint.
Constraint.apply() creates a d_sets dictionary; a list would work as well for its purposes
The constrain() function processes all Constraints associated with any cell that it changes, including the constraint that triggered the change. This is pretty silly, but done because the apply() method of Sets does not necessarily eliminate all illegal values when it’s invoked (though it will always eliminate some). This is a peculiarity of this particular class; if it were corrected, constrain() could be altered to eliminate some double-processing of Constraints.

Simple Things First

Let’s eliminate the d_sets dictionary from Constraint.apply(), and see if that gets us anywhere.

class Constraint(object):
	# ... snip ...
	def apply(self, solution):
		l_sets = [(c, tuple(map(int, solution[c]))) for c in self.cells]; d_bad = {}
		for k,v in l_sets:
			others = tuple(ov for ok,ov in l_sets if ok != k)
			d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))
		return d_bad

It doesn’t. If there’s any improvement at all, I can’t detect it. Still, no harm done, so let’s leave the change and move on.

Good Lists

Let’s try returning good lists, instead of bad dictionaries. We’ll have to change four pieces of code to make this work:

Constraint.apply()

class Constraint(object):
	# ... snip ...
	def apply(self, solution):
		l_sets = [(c, tuple(map(int, solution[c]))) for c in self.cells]; l_good = []
		for k,v in l_sets:
			others = tuple(ov for ok,ov in l_sets if ok != k)
			l_good.append((k, ''.join(str(e) for e in v if self._test_component(e, others))))
		return l_good

Assert.apply()

class Assert(Constraint):
	def apply(self, solution):
		v = str(self.value)
		return ((c, v) for c in self.cells)

Set.apply()

class Set(Constraint):
	def apply(self, solution):
		# For each cell:
		l_good = [[c, solution[c]] for c in self.cells]
		for c in self.cells:
			# If a cell has only one possible value, remove that value from all other cells
			if (len(solution[c]) != 1): continue
			for e in l_good:
				if (e[0] != c): e[1] = e[1].replace(solution[c], '')
		return l_good

constrain()

def constrain(solution, *constraints):
	queue = set(constraints)
	while (queue):
		constraint = queue.pop()
		for cell, values in constraint.apply(solution):
			if (not values):
				return False
			if (solution[cell] == values):
				continue
			solution[cell] = values
			queue.update(d_constraints[cell])
	return solution

This seems to buy us something; it appears that we save about 0.02s per loop (a change from ~0.235s/loop before the change to ~0.215s/loop afterwards). It’s difficult to measure, but it looks like we might pick up 9% here.

Smarter Sets

Let’s try re-coding Set.apply() to eliminate every prohibited value, and see what the effects are.

class Set(Constraint):
	def _remove(self, l_good, cell, value):
		for p in l_good:
			if (p[0] == cell): continue
			if (value in p[1]):
				p[1] = p[1].replace(value, '')
				if (len(p[1]) == 1):
					self._remove(l_good, *p)
	def apply(self, solution):
		# For each cell:
		l_good = [[c, solution[c]] for c in self.cells]
		for c,v in l_good:
			# If a cell has only one possible value, remove that value from all other cells
			if (len(v) == 1): self._remove(l_good, c, v)
		return l_good

This appears to be a big win; per-loop times are now hovering aroung ~.165s/loop, for perhaps another 20% gain. Now, let’s try altering constrain() s.t. it doesn’t re-process Constraints:

def constrain(solution, *constraints):
	queue = set(constraints)
	while (queue):
		constraint = queue.pop()
		for cell, values in constraint.apply(solution):
			if (not values):
				return False
			if (solution[cell] == values):
				continue
			solution[cell] = values
			queue.update(d_constraints[cell])
		queue.discard(constraint)
	return solution

This change sends per-loop times down towards .14/loop, which is another 15% pickup.

Conclusion

Back-to-back tests with the original version of the solver indicate that our optimizations reduced per-loop time from ~0.48s to ~0.14s. This is a 70% reduction in runtime, or a 3.5x speedup. The final hotshot stats look like this:

         23969 function calls (23682 primitive calls) in 467.398 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      407  143.526    0.353  330.906    0.813 neknek_1.py:54(apply)
     5692   69.164    0.012  171.720    0.030 neknek_1.py:58()
     3323   41.876    0.013   41.876    0.013 neknek_1.py:16(can_make_sum_p)
      745   39.288    0.053   76.232    0.102 neknek_1.py:102(apply)
2122/1967   36.944    0.017   36.944    0.019 neknek_1.py:95(_remove)
       44   36.592    0.832  443.774   10.086 neknek_1.py:134(constrain)
     3338   35.882    0.011   77.758    0.023 neknek_1.py:70(_test_component)
     3921   15.660    0.004   15.660    0.004 neknek_1.py:57()
  783/780    9.552    0.012    9.611    0.012 neknek_1.py:27(can_make_product_p)
      845    9.416    0.011   19.027    0.023 neknek_1.py:84(_test_component)
     44/1    7.007    0.159  404.628  404.628 neknek_1.py:152(search)
     1213    6.120    0.005    6.120    0.005 neknek_1.py:157()
      598    3.737    0.006    3.737    0.006 neknek_1.py:77(_test_component)
     66/2    2.674    0.041  403.959  201.979 neknek_1.py:159()
        1    2.401    2.401  465.210  465.210 neknek_1.py:122(solve)
      193    2.034    0.011    2.034    0.011 neknek_1.py:91(_test_component)
        1    1.281    1.281    2.084    2.084 neknek_1.py:113(__init__)

	... clipped ...

It’s difficult to stop an exercise like this; one naturally looks at Constraint.apply(), and wonders what the heck it’s doing that’s taking so long. Is it the int <-> string conversions? Is there a way around them? (It does look like you can pick up ~20% if you memoize these conversions.)

But I suppose this has to stop somewhere, and it’s stopping here. For now. I hope you found it interesting. Please feel free to examine the final code.

Share and Enjoy:

Recap

Analysis

Simple Things First

Good Lists

Smarter Sets

Conclusion

Services

Find Stuff

Pages

Buy My Apps

Other Stuff I’ve Built

Book Club

Archives

Categories

Blogroll