Python Performance Tuning (con’t)

Let’s wrap up our hunt for performance improvements in our KenKen puzzle solver. This sort of thing can go on nearly endlessly, of course, but after today I think we’ll have gotten most of the easy stuff, and seen dramatic gains in speed for our trouble.

Recap

After our last changes, the top few items in the `hotshot` stats look like this:

``````         28901 function calls (28655 primitive calls) in 771.505 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
692  183.663    0.265  464.785    0.672 neknek_1.py:54(apply)
44  180.457    4.101  749.238   17.028 neknek_1.py:128(constrain)
1027  103.779    0.101  103.779    0.101 neknek_1.py:94(apply)
2721   61.978    0.023   61.978    0.023 neknek_1.py:55()
2643   55.001    0.021  191.786    0.073 neknek_1.py:58()

... clipped ...``````

Analysis

To make sense of this, it’s necessary to have to code to hand; the version timed by this code is available here. The execution time is now spread out a bit more between functions; we’re no longer seeing the major hotspots we’ve seen over the last two days. The majority of the execution time is now spent in three functions:

`Constraint.apply()`

``````class Constraint(object):
# ... snip ...
def apply(self, solution):
d_sets = dict((c, tuple(map(int, solution[c]))) for c in self.cells); d_bad = {}
for k,v in d_sets.items():
others = tuple(ov for ok,ov in d_sets.items() if ok != k)
d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))

`constrain()`

``````def constrain(solution, *constraints):
queue = set(constraints)
while (queue):
constraint = queue.pop()
values = solution[cell]
values = values.replace(choice, '')
if (not values):
return False
if (solution[cell] == values):
continue
solution[cell] = values
queue.update(d_constraints[cell])
return solution``````

`Set.apply()`

``````class Set(Constraint):
def apply(self, solution):
# For each cell:
for c in self.cells:
# If a cell has only one possible value, remove that value from all other cells
if (len(solution[c]) != 1): continue
for c2 in self.cells:

Looking at this code, we can make several observations:

• It’s probably not very efficient for `Constraints` to return dictionaries of items to remove from the solution. We might try returning lists of (cell ID, cell possibilities) pairs instead, in which the possibilities returned are all those not precluded by the constraint.
• `Constraint.apply()` creates a `d_sets` dictionary; a list would work as well for its purposes
• The `constrain()` function processes all `Constraints` associated with any cell that it changes, including the constraint that triggered the change. This is pretty silly, but done because the `apply()` method of `Set`s does not necessarily eliminate all illegal values when it’s invoked (though it will always eliminate some). This is a peculiarity of this particular class; if it were corrected, `constrain()` could be altered to eliminate some double-processing of `Constraints`.

Simple Things First

Let’s eliminate the `d_sets` dictionary from `Constraint.apply()`, and see if that gets us anywhere.

``````class Constraint(object):
# ... snip ...
def apply(self, solution):
l_sets = [(c, tuple(map(int, solution[c]))) for c in self.cells]; d_bad = {}
for k,v in l_sets:
others = tuple(ov for ok,ov in l_sets if ok != k)
d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))

It doesn’t. If there’s any improvement at all, I can’t detect it. Still, no harm done, so let’s leave the change and move on.

Good Lists

Let’s try returning good lists, instead of bad dictionaries. We’ll have to change four pieces of code to make this work:

`Constraint.apply()`

``````class Constraint(object):
# ... snip ...
def apply(self, solution):
l_sets = [(c, tuple(map(int, solution[c]))) for c in self.cells]; l_good = []
for k,v in l_sets:
others = tuple(ov for ok,ov in l_sets if ok != k)
l_good.append((k, ''.join(str(e) for e in v if self._test_component(e, others))))
return l_good``````

`Assert.apply()`

``````class Assert(Constraint):
def apply(self, solution):
v = str(self.value)
return ((c, v) for c in self.cells)``````

`Set.apply()`

``````class Set(Constraint):
def apply(self, solution):
# For each cell:
l_good = [[c, solution[c]] for c in self.cells]
for c in self.cells:
# If a cell has only one possible value, remove that value from all other cells
if (len(solution[c]) != 1): continue
for e in l_good:
if (e != c): e = e.replace(solution[c], '')
return l_good``````

`constrain()`

``````def constrain(solution, *constraints):
queue = set(constraints)
while (queue):
constraint = queue.pop()
for cell, values in constraint.apply(solution):
if (not values):
return False
if (solution[cell] == values):
continue
solution[cell] = values
queue.update(d_constraints[cell])
return solution``````

This seems to buy us something; it appears that we save about 0.02s per loop (a change from ~0.235s/loop before the change to ~0.215s/loop afterwards). It’s difficult to measure, but it looks like we might pick up 9% here.

Smarter Sets

Let’s try re-coding `Set.apply()` to eliminate every prohibited value, and see what the effects are.

``````class Set(Constraint):
def _remove(self, l_good, cell, value):
for p in l_good:
if (p == cell): continue
if (value in p):
p = p.replace(value, '')
if (len(p) == 1):
self._remove(l_good, *p)
def apply(self, solution):
# For each cell:
l_good = [[c, solution[c]] for c in self.cells]
for c,v in l_good:
# If a cell has only one possible value, remove that value from all other cells
if (len(v) == 1): self._remove(l_good, c, v)
return l_good``````

This appears to be a big win; per-loop times are now hovering aroung ~.165s/loop, for perhaps another 20% gain. Now, let’s try altering `constrain()` s.t. it doesn’t re-process `Constraints`:

``````def constrain(solution, *constraints):
queue = set(constraints)
while (queue):
constraint = queue.pop()
for cell, values in constraint.apply(solution):
if (not values):
return False
if (solution[cell] == values):
continue
solution[cell] = values
queue.update(d_constraints[cell])
return solution``````

This change sends per-loop times down towards .14/loop, which is another 15% pickup.

Conclusion

Back-to-back tests with the original version of the solver indicate that our optimizations reduced per-loop time from ~0.48s to ~0.14s. This is a 70% reduction in runtime, or a 3.5x speedup. The final `hotshot` stats look like this:

``````         23969 function calls (23682 primitive calls) in 467.398 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
407  143.526    0.353  330.906    0.813 neknek_1.py:54(apply)
5692   69.164    0.012  171.720    0.030 neknek_1.py:58()
3323   41.876    0.013   41.876    0.013 neknek_1.py:16(can_make_sum_p)
745   39.288    0.053   76.232    0.102 neknek_1.py:102(apply)
2122/1967   36.944    0.017   36.944    0.019 neknek_1.py:95(_remove)
44   36.592    0.832  443.774   10.086 neknek_1.py:134(constrain)
3338   35.882    0.011   77.758    0.023 neknek_1.py:70(_test_component)
3921   15.660    0.004   15.660    0.004 neknek_1.py:57()
783/780    9.552    0.012    9.611    0.012 neknek_1.py:27(can_make_product_p)
845    9.416    0.011   19.027    0.023 neknek_1.py:84(_test_component)
44/1    7.007    0.159  404.628  404.628 neknek_1.py:152(search)
1213    6.120    0.005    6.120    0.005 neknek_1.py:157()
598    3.737    0.006    3.737    0.006 neknek_1.py:77(_test_component)
66/2    2.674    0.041  403.959  201.979 neknek_1.py:159()
1    2.401    2.401  465.210  465.210 neknek_1.py:122(solve)
193    2.034    0.011    2.034    0.011 neknek_1.py:91(_test_component)
1    1.281    1.281    2.084    2.084 neknek_1.py:113(__init__)

... clipped ...``````

It’s difficult to stop an exercise like this; one naturally looks at `Constraint.apply()`, and wonders what the heck it’s doing that’s taking so long. Is it the `int` <-> `string` conversions? Is there a way around them? (It does look like you can pick up ~20% if you memoize these conversions.)

But I suppose this has to stop somewhere, and it’s stopping here. For now. I hope you found it interesting. Please feel free to examine the final code.

This entry was posted in Projects, Python. Bookmark the permalink.