# Python Performance Tuning (con’t)

Let’s continue tuning our KenKen puzzle solver. When we left off yesterday, we’d cut execution time by 30% with a one-line change to the program, but there is still much room for improvement.

### Recap

After our last change, the top few items in the `hotshot` stats looked like this:

``````         94797 function calls (35465 primitive calls) in 1377.959 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
21749/4177  282.142    0.013  456.155    0.109 neknek_1.py:15(can_make_sum_p)
44  179.129    4.071 1354.233   30.778 neknek_1.py:112(constrain)
24660/11003  174.013    0.007  340.939    0.031 neknek_1.py:18()
14592/1357  160.274    0.011  308.033    0.227 neknek_1.py:21(can_make_product_p)
18409/3670  147.759    0.008  271.829    0.074 neknek_1.py:24()
635  121.681    0.192 1072.522    1.689 neknek_1.py:44(apply)

... clipped ...``````

### Memoization

The `can_make_sum_p()` function is still consuming the plurality of the time. My guess is that it’s called with fairly repetitive arguments, so it’s a good candidate for memoization. The natural (quick-and-dirty) way to code this would be:

``````# Can we select exactly one member from each set s.t. the sum of all selected members is t?
d_sum_queries = {}
def can_make_sum_p(t, sets):
if (len(sets) == 1): return (t in sets[0])
r = d_sum_queries.get((t, sets))
if (r == None):
head = sets[0]; tail = sets[1:]
r = any(can_make_sum_p(t-e, tail) for e in head if e <= t)
d_sum_queries[(t, sets)] = r
return r``````

However, this doesn't work. The `sets` argument is a list of lists, and lists aren't hashable. We'll need to revise `Constraint.apply()`:

``````class Constraint(object):
# ... snip ...
def apply(self, solution):
d_sets = dict((c, tuple(map(int, solution[c]))) for c in self.cells); d_bad = {}
for k,v in d_sets.items():
others = tuple(ov for ok,ov in d_sets.items() if ok != k)
d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))

We should also clear the memoization dictionary from time to time; A good time to do so seems to be when the solver is first invoked:

``````def solve(puzzle):
# ... snip ...
# Solve
d_sum_queries = {}
symbols = string.digits[1:1+puzzle.size]
return search(constrain(dict((c,symbols) for c in d_constraints.keys()), *puzzle.cages))``````

Timeit reports that these changes save around 0.03s per loop (a change from ~0.29s/loop before the change to ~0.26s/loop afterwards). This is a gain of a further 10%. The `hotshot` stats tell a similarly encouraging story:

``````         53889 function calls (29397 primitive calls) in 901.370 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
44  166.225    3.778  879.568   19.990 neknek_1.py:117(constrain)
613  159.227    0.260  615.020    1.003 neknek_1.py:49(apply)
12704/1182  116.667    0.009  221.817    0.188 neknek_1.py:26(can_make_product_p)
16039/3214  105.150    0.007  194.698    0.061 neknek_1.py:29()
1028   98.114    0.095   98.114    0.095 neknek_1.py:83(apply)
2397   53.202    0.022   53.202    0.022 neknek_1.py:50()
2333   49.252    0.021  378.650    0.162 neknek_1.py:53()
4368/4352   44.640    0.010   44.814    0.010 neknek_1.py:16(can_make_sum_p)
4366   41.279    0.009   86.093    0.020 neknek_1.py:61(_test_component)
5608   23.941    0.004   23.941    0.004 neknek_1.py:52()
1247   13.506    0.011  235.324    0.189 neknek_1.py:72(_test_component)
44/1    6.579    0.150  748.179  748.179 neknek_1.py:137(search)
898    5.546    0.006    5.546    0.006 neknek_1.py:68(_test_component)
1213    5.536    0.005    5.536    0.005 neknek_1.py:142()
66/2    2.452    0.037  747.517  373.758 neknek_1.py:144()
278    2.435    0.009    2.435    0.009 neknek_1.py:79(_test_component)
1    2.338    2.338  899.578  899.578 neknek_1.py:105(solve)
1    1.144    1.144    1.685    1.685 neknek_1.py:96(__init__)

... clipped ...``````

### Generalization

Since `can_make_product_p()` is also consuming a lot of time, and since it will yield to exactly the same techniques applied to `can_make_sum_p()`, let's change its base case, and memoize it as well:

``````# Can we select exactly one member from each set s.t. the product of all selected members is t?
d_prod_queries = {}
def can_make_product_p(t, sets):
if (len(sets) == 1): return (t in sets[0])
r = d_prod_queries.get((t, sets))
if (r == None):
head = sets[0]; tail = sets[1:]
r = any(can_make_product_p(t/e, tail) for e in head if not t%e)
d_prod_queries[(t, sets)] = r
return r``````

We should also clear the product memoization dictionary when appropriate:

``````def solve(puzzle):
# ... snip ...
# Solve
d_sum_queries = {}
d_prod_queries = {}
symbols = string.digits[1:1+puzzle.size]
return search(constrain(dict((c,symbols) for c in d_constraints.keys()), *puzzle.cages))``````

Timeit reports that these changes save around another 0.03s per loop (a change from ~0.26s/loop before the change to ~0.23s/loop afterwards). This is a gain of a further 11%. The `hotshot` stats tell a similarly encouraging story:

``````         27560 function calls (27332 primitive calls) in 682.838 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
638  163.215    0.256  415.008    0.650 neknek_1.py:54(apply)
44  156.274    3.552  661.793   15.041 neknek_1.py:122(constrain)
959   90.298    0.094   90.298    0.094 neknek_1.py:88(apply)
2526   54.629    0.022   54.629    0.022 neknek_1.py:55()
2437   49.478    0.020  171.874    0.071 neknek_1.py:58()
4637/4588   46.830    0.010   47.337    0.010 neknek_1.py:16(can_make_sum_p)
4601   42.676    0.009   90.013    0.020 neknek_1.py:66(_test_component)
6086   25.290    0.004   25.290    0.004 neknek_1.py:57()
1249   12.826    0.010   25.222    0.020 neknek_1.py:77(_test_component)
1188   12.396    0.010   12.396    0.010 neknek_1.py:27(can_make_product_p)
44/1    6.278    0.143  608.917  608.917 neknek_1.py:142(search)
1213    5.042    0.004    5.042    0.004 neknek_1.py:147()
897    4.843    0.005    4.843    0.005 neknek_1.py:73(_test_component)
66/2    2.379    0.036  608.273  304.137 neknek_1.py:149()
1    2.337    2.337  680.882  680.882 neknek_1.py:110(solve)
275    2.317    0.008    2.317    0.008 neknek_1.py:84(_test_component)
1    1.301    1.301    1.847    1.847 neknek_1.py:101(__init__)

... clipped ...``````

### Safety

At this point, we should attend to a minor corner case. Since we changed the base cases of `can_make_sum_p()` and `can_make_product_p()` from 0 to 1, these functions can no longer be safely called with empty `sets` arguments. In turn, this means that they can no longer be used by `Sum` and `Prod` constraints applied to only one cell. (Such constraints would invoke `_test_component()` with an empty context.) Therefore, let's code the `Sum` and `Prod` constructors to throw exceptions if these constraints aren't being applied to at least two cells:

``````class Sum(Constraint):
def __init__(self, value, *cells):
Constraint.__init__(self, value, *cells)
if (len(cells) < 2): raise Exception('Sum constraints must be applied to 2 or more cells')
def _test_component(self, component, context):
return (self.value>=component) and can_make_sum_p(self.value-component, context)``````
``````class Prod(Constraint):
def __init__(self, value, *cells):
Constraint.__init__(self, value, *cells)
if (len(cells) < 2): raise Exception('Prod constraints must be applied to 2 or more cells')
def _test_component(self, component, context):
return (not self.value%component) and can_make_product_p(self.value/component, context)``````

### Coming Attractions

Tomorrow we'll consider some slightly larger-scale optimizations. Until then, you can view the latest version of the solver.

Share and Enjoy:
This entry was posted in Projects, Python. Bookmark the permalink.