Python Performance Tuning (con’t)

Let’s continue tuning our KenKen puzzle solver. When we left off yesterday, we’d cut execution time by 30% with a one-line change to the program, but there is still much room for improvement.

Recap

After our last change, the top few items in the hotshot stats looked like this:

         94797 function calls (35465 primitive calls) in 1377.959 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
21749/4177  282.142    0.013  456.155    0.109 neknek_1.py:15(can_make_sum_p)
       44  179.129    4.071 1354.233   30.778 neknek_1.py:112(constrain)
24660/11003  174.013    0.007  340.939    0.031 neknek_1.py:18()
14592/1357  160.274    0.011  308.033    0.227 neknek_1.py:21(can_make_product_p)
18409/3670  147.759    0.008  271.829    0.074 neknek_1.py:24()
      635  121.681    0.192 1072.522    1.689 neknek_1.py:44(apply)

	... clipped ...

Memoization

The can_make_sum_p() function is still consuming the plurality of the time. My guess is that it’s called with fairly repetitive arguments, so it’s a good candidate for memoization. The natural (quick-and-dirty) way to code this would be:

# Can we select exactly one member from each set s.t. the sum of all selected members is t?
d_sum_queries = {}
def can_make_sum_p(t, sets):
	if (len(sets) == 1): return (t in sets[0])
	r = d_sum_queries.get((t, sets))
	if (r == None):
		head = sets[0]; tail = sets[1:]
		r = any(can_make_sum_p(t-e, tail) for e in head if e <= t)
		d_sum_queries[(t, sets)] = r
	return r

However, this doesn't work. The sets argument is a list of lists, and lists aren't hashable. We'll need to revise Constraint.apply():

class Constraint(object):
	# ... snip ...
	def apply(self, solution):
		d_sets = dict((c, tuple(map(int, solution[c]))) for c in self.cells); d_bad = {}
		for k,v in d_sets.items():
			others = tuple(ov for ok,ov in d_sets.items() if ok != k)
			d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))
		return d_bad

We should also clear the memoization dictionary from time to time; A good time to do so seems to be when the solver is first invoked:

def solve(puzzle):
	# ... snip ...
	# Solve
	d_sum_queries = {}
	symbols = string.digits[1:1+puzzle.size]
	return search(constrain(dict((c,symbols) for c in d_constraints.keys()), *puzzle.cages))

Timeit reports that these changes save around 0.03s per loop (a change from ~0.29s/loop before the change to ~0.26s/loop afterwards). This is a gain of a further 10%. The hotshot stats tell a similarly encouraging story:

         53889 function calls (29397 primitive calls) in 901.370 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       44  166.225    3.778  879.568   19.990 neknek_1.py:117(constrain)
      613  159.227    0.260  615.020    1.003 neknek_1.py:49(apply)
12704/1182  116.667    0.009  221.817    0.188 neknek_1.py:26(can_make_product_p)
16039/3214  105.150    0.007  194.698    0.061 neknek_1.py:29()
     1028   98.114    0.095   98.114    0.095 neknek_1.py:83(apply)
     2397   53.202    0.022   53.202    0.022 neknek_1.py:50()
     2333   49.252    0.021  378.650    0.162 neknek_1.py:53()
4368/4352   44.640    0.010   44.814    0.010 neknek_1.py:16(can_make_sum_p)
     4366   41.279    0.009   86.093    0.020 neknek_1.py:61(_test_component)
     5608   23.941    0.004   23.941    0.004 neknek_1.py:52()
     1247   13.506    0.011  235.324    0.189 neknek_1.py:72(_test_component)
     44/1    6.579    0.150  748.179  748.179 neknek_1.py:137(search)
      898    5.546    0.006    5.546    0.006 neknek_1.py:68(_test_component)
     1213    5.536    0.005    5.536    0.005 neknek_1.py:142()
     66/2    2.452    0.037  747.517  373.758 neknek_1.py:144()
      278    2.435    0.009    2.435    0.009 neknek_1.py:79(_test_component)
        1    2.338    2.338  899.578  899.578 neknek_1.py:105(solve)
        1    1.144    1.144    1.685    1.685 neknek_1.py:96(__init__)

	... clipped ...

Generalization

Since can_make_product_p() is also consuming a lot of time, and since it will yield to exactly the same techniques applied to can_make_sum_p(), let's change its base case, and memoize it as well:

# Can we select exactly one member from each set s.t. the product of all selected members is t?
d_prod_queries = {}
def can_make_product_p(t, sets):
	if (len(sets) == 1): return (t in sets[0])
	r = d_prod_queries.get((t, sets))
	if (r == None):
		head = sets[0]; tail = sets[1:]
		r = any(can_make_product_p(t/e, tail) for e in head if not t%e)
		d_prod_queries[(t, sets)] = r
	return r

We should also clear the product memoization dictionary when appropriate:

def solve(puzzle):
	# ... snip ...
	# Solve
	d_sum_queries = {}
	d_prod_queries = {}
	symbols = string.digits[1:1+puzzle.size]
	return search(constrain(dict((c,symbols) for c in d_constraints.keys()), *puzzle.cages))

Timeit reports that these changes save around another 0.03s per loop (a change from ~0.26s/loop before the change to ~0.23s/loop afterwards). This is a gain of a further 11%. The hotshot stats tell a similarly encouraging story:

         27560 function calls (27332 primitive calls) in 682.838 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      638  163.215    0.256  415.008    0.650 neknek_1.py:54(apply)
       44  156.274    3.552  661.793   15.041 neknek_1.py:122(constrain)
      959   90.298    0.094   90.298    0.094 neknek_1.py:88(apply)
     2526   54.629    0.022   54.629    0.022 neknek_1.py:55()
     2437   49.478    0.020  171.874    0.071 neknek_1.py:58()
4637/4588   46.830    0.010   47.337    0.010 neknek_1.py:16(can_make_sum_p)
     4601   42.676    0.009   90.013    0.020 neknek_1.py:66(_test_component)
     6086   25.290    0.004   25.290    0.004 neknek_1.py:57()
     1249   12.826    0.010   25.222    0.020 neknek_1.py:77(_test_component)
     1188   12.396    0.010   12.396    0.010 neknek_1.py:27(can_make_product_p)
     44/1    6.278    0.143  608.917  608.917 neknek_1.py:142(search)
     1213    5.042    0.004    5.042    0.004 neknek_1.py:147()
      897    4.843    0.005    4.843    0.005 neknek_1.py:73(_test_component)
     66/2    2.379    0.036  608.273  304.137 neknek_1.py:149()
        1    2.337    2.337  680.882  680.882 neknek_1.py:110(solve)
      275    2.317    0.008    2.317    0.008 neknek_1.py:84(_test_component)
        1    1.301    1.301    1.847    1.847 neknek_1.py:101(__init__)

	... clipped ...

Safety

At this point, we should attend to a minor corner case. Since we changed the base cases of can_make_sum_p() and can_make_product_p() from 0 to 1, these functions can no longer be safely called with empty sets arguments. In turn, this means that they can no longer be used by Sum and Prod constraints applied to only one cell. (Such constraints would invoke _test_component() with an empty context.) Therefore, let's code the Sum and Prod constructors to throw exceptions if these constraints aren't being applied to at least two cells:

class Sum(Constraint):
	def __init__(self, value, *cells):
		Constraint.__init__(self, value, *cells)
		if (len(cells) < 2): raise Exception('Sum constraints must be applied to 2 or more cells')
	def _test_component(self, component, context):
		return (self.value>=component) and can_make_sum_p(self.value-component, context)
class Prod(Constraint):
	def __init__(self, value, *cells):
		Constraint.__init__(self, value, *cells)
		if (len(cells) < 2): raise Exception('Prod constraints must be applied to 2 or more cells')
	def _test_component(self, component, context):
		return (not self.value%component) and can_make_product_p(self.value/component, context)

Coming Attractions

Tomorrow we'll consider some slightly larger-scale optimizations. Until then, you can view the latest version of the solver.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Projects, Python. Bookmark the permalink.

Comments are closed.