Let’s wrap up our hunt for performance improvements in our KenKen puzzle solver. This sort of thing can go on nearly endlessly, of course, but after today I think we’ll have gotten most of the easy stuff, and seen dramatic gains in speed for our trouble.
Recap
After our last changes, the top few items in the hotshot
stats look like this:
28901 function calls (28655 primitive calls) in 771.505 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
692 183.663 0.265 464.785 0.672 neknek_1.py:54(apply)
44 180.457 4.101 749.238 17.028 neknek_1.py:128(constrain)
1027 103.779 0.101 103.779 0.101 neknek_1.py:94(apply)
2721 61.978 0.023 61.978 0.023 neknek_1.py:55()
2643 55.001 0.021 191.786 0.073 neknek_1.py:58()
... clipped ...
Analysis
To make sense of this, it’s necessary to have to code to hand; the version timed by this code is available here. The execution time is now spread out a bit more between functions; we’re no longer seeing the major hotspots we’ve seen over the last two days. The majority of the execution time is now spent in three functions:
Constraint.apply()
class Constraint(object):
# ... snip ...
def apply(self, solution):
d_sets = dict((c, tuple(map(int, solution[c]))) for c in self.cells); d_bad = {}
for k,v in d_sets.items():
others = tuple(ov for ok,ov in d_sets.items() if ok != k)
d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))
return d_bad
constrain()
def constrain(solution, *constraints):
queue = set(constraints)
while (queue):
constraint = queue.pop()
for cell, bad_choices in constraint.apply(solution).items():
values = solution[cell]
for choice in bad_choices:
values = values.replace(choice, '')
if (not values):
return False
if (solution[cell] == values):
continue
solution[cell] = values
queue.update(d_constraints[cell])
return solution
Set.apply()
class Set(Constraint):
def apply(self, solution):
# For each cell:
d_bad = {}
for c in self.cells:
# If a cell has only one possible value, remove that value from all other cells
if (len(solution[c]) != 1): continue
for c2 in self.cells:
if (c2 != c): d_bad[c2] = d_bad.get(c2,'') + solution[c]
return d_bad
Looking at this code, we can make several observations:
- It’s probably not very efficient for
Constraints
to return dictionaries of items to remove from the solution. We might try returning lists of (cell ID, cell possibilities) pairs instead, in which the possibilities returned are all those not precluded by the constraint. Constraint.apply()
creates ad_sets
dictionary; a list would work as well for its purposes- The
constrain()
function processes allConstraints
associated with any cell that it changes, including the constraint that triggered the change. This is pretty silly, but done because theapply()
method ofSet
s does not necessarily eliminate all illegal values when it’s invoked (though it will always eliminate some). This is a peculiarity of this particular class; if it were corrected,constrain()
could be altered to eliminate some double-processing ofConstraints
.
Simple Things First
Let’s eliminate the d_sets
dictionary from Constraint.apply()
, and see if that gets us anywhere.
class Constraint(object):
# ... snip ...
def apply(self, solution):
l_sets = [(c, tuple(map(int, solution[c]))) for c in self.cells]; d_bad = {}
for k,v in l_sets:
others = tuple(ov for ok,ov in l_sets if ok != k)
d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))
return d_bad
It doesn’t. If there’s any improvement at all, I can’t detect it. Still, no harm done, so let’s leave the change and move on.
Good Lists
Let’s try returning good lists, instead of bad dictionaries. We’ll have to change four pieces of code to make this work:
Constraint.apply()
class Constraint(object):
# ... snip ...
def apply(self, solution):
l_sets = [(c, tuple(map(int, solution[c]))) for c in self.cells]; l_good = []
for k,v in l_sets:
others = tuple(ov for ok,ov in l_sets if ok != k)
l_good.append((k, ''.join(str(e) for e in v if self._test_component(e, others))))
return l_good
Assert.apply()
class Assert(Constraint):
def apply(self, solution):
v = str(self.value)
return ((c, v) for c in self.cells)
Set.apply()
class Set(Constraint):
def apply(self, solution):
# For each cell:
l_good = [[c, solution[c]] for c in self.cells]
for c in self.cells:
# If a cell has only one possible value, remove that value from all other cells
if (len(solution[c]) != 1): continue
for e in l_good:
if (e[0] != c): e[1] = e[1].replace(solution[c], '')
return l_good
constrain()
def constrain(solution, *constraints):
queue = set(constraints)
while (queue):
constraint = queue.pop()
for cell, values in constraint.apply(solution):
if (not values):
return False
if (solution[cell] == values):
continue
solution[cell] = values
queue.update(d_constraints[cell])
return solution
This seems to buy us something; it appears that we save about 0.02s per loop (a change from ~0.235s/loop before the change to ~0.215s/loop afterwards). It’s difficult to measure, but it looks like we might pick up 9% here.
Smarter Sets
Let’s try re-coding Set.apply()
to eliminate every prohibited value, and see what the effects are.
class Set(Constraint):
def _remove(self, l_good, cell, value):
for p in l_good:
if (p[0] == cell): continue
if (value in p[1]):
p[1] = p[1].replace(value, '')
if (len(p[1]) == 1):
self._remove(l_good, *p)
def apply(self, solution):
# For each cell:
l_good = [[c, solution[c]] for c in self.cells]
for c,v in l_good:
# If a cell has only one possible value, remove that value from all other cells
if (len(v) == 1): self._remove(l_good, c, v)
return l_good
This appears to be a big win; per-loop times are now hovering aroung ~.165s/loop, for perhaps another 20% gain. Now, let’s try altering constrain()
s.t. it doesn’t re-process Constraints
:
def constrain(solution, *constraints):
queue = set(constraints)
while (queue):
constraint = queue.pop()
for cell, values in constraint.apply(solution):
if (not values):
return False
if (solution[cell] == values):
continue
solution[cell] = values
queue.update(d_constraints[cell])
queue.discard(constraint)
return solution
This change sends per-loop times down towards .14/loop, which is another 15% pickup.
Conclusion
Back-to-back tests with the original version of the solver indicate that our optimizations reduced per-loop time from ~0.48s to ~0.14s. This is a 70% reduction in runtime, or a 3.5x speedup. The final hotshot
stats look like this:
23969 function calls (23682 primitive calls) in 467.398 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
407 143.526 0.353 330.906 0.813 neknek_1.py:54(apply)
5692 69.164 0.012 171.720 0.030 neknek_1.py:58()
3323 41.876 0.013 41.876 0.013 neknek_1.py:16(can_make_sum_p)
745 39.288 0.053 76.232 0.102 neknek_1.py:102(apply)
2122/1967 36.944 0.017 36.944 0.019 neknek_1.py:95(_remove)
44 36.592 0.832 443.774 10.086 neknek_1.py:134(constrain)
3338 35.882 0.011 77.758 0.023 neknek_1.py:70(_test_component)
3921 15.660 0.004 15.660 0.004 neknek_1.py:57()
783/780 9.552 0.012 9.611 0.012 neknek_1.py:27(can_make_product_p)
845 9.416 0.011 19.027 0.023 neknek_1.py:84(_test_component)
44/1 7.007 0.159 404.628 404.628 neknek_1.py:152(search)
1213 6.120 0.005 6.120 0.005 neknek_1.py:157()
598 3.737 0.006 3.737 0.006 neknek_1.py:77(_test_component)
66/2 2.674 0.041 403.959 201.979 neknek_1.py:159()
1 2.401 2.401 465.210 465.210 neknek_1.py:122(solve)
193 2.034 0.011 2.034 0.011 neknek_1.py:91(_test_component)
1 1.281 1.281 2.084 2.084 neknek_1.py:113(__init__)
... clipped ...
It’s difficult to stop an exercise like this; one naturally looks at Constraint.apply()
, and wonders what the heck it’s doing that’s taking so long. Is it the int
<-> string
conversions? Is there a way around them? (It does look like you can pick up ~20% if you memoize these conversions.)
But I suppose this has to stop somewhere, and it’s stopping here. For now. I hope you found it interesting. Please feel free to examine the final code.