Synchronizing Ordered Lists

How can two ordered lists be synchronized with the minimum of insert and delete operations, given that only those operations are allowed? Let’s take a cut at this problem, using Python lists for illustration.

(Editorial note: There’s probably an elegant, well-known solution to this problem, but I can’t find it right now.)

Terms

Let’s clarify the problem a bit: Our goal is to produce one or more lists of operations that, when applied to a first, subject list will render it identical to a second, target list. Both the subject and target lists are ordered, and may contain duplicates. The subject and target lists may be of different lengths.

Representation

To begin with, let’s set out how we’ll represent the results of our algorithm: the insert and delete operations to be performed.

An insert will be represented as an (index, value) tuple, where index represents the final position of the element after synchronization. I.e., if we need to insert a 5 to create the final list [1, 3, 5, 7], the insert would always be represented by the tuple (2, 5), irrespective of any other operations to be performed during the synchronization.

A delete will be represented as an index, representing the original position of the element before synchronization. I.e., if we need to remove a 4 from the original list [2, 4, 6, 8], the delete would always be represented by the index 1, irrespective of any other operations to be performed during the synchronization.

Obviously, operations can’t be applied in an arbitrary order. The order in which the operations produced by this algorithm are intended to be applied is:

  • First, all deletes, ordered from the highest to the lowest index
  • Then, all inserts, ordered from the lowest to the highest index

Code

Here’s the algorithm in its entirety:

def sync_ordered_list(a, b):
    # a is the subject list
    # b is the target list
    # x is the "current position" in the subject list
    # y is the "current position" in the target list
    # i is the list of inserts
    # d is the list of deletes

    # Initialize iterators and accumulators
    x = 0; y = 0; i = []; d = []

    # While there are unexamined elements in either the subject or target list:
    while (x < len(a)) or (y < len(b)):

	# If the target list is exhausted,
	# delete the current element from the subject list
        if y >= len(b): d.append(x); x += 1

	# O/w, if the subject list is exhausted,
	# insert the current element from the target list
        elif x >= len(a): i.append((y, b[y])); y += 1

	# O/w, if the current subject element precedes the current target element,
	# delete the current subject element.
        elif a[x] < b[y]: d.append(x); x += 1

	# O/w, if the current subject element follows the current target element,
	# insert the current target element.
        elif a[x] > b[y]: i.append((y, b[y])); y += 1

	# O/w the current elements match; consider the next pair
        else: x += 1; y += 1

    # Return accumulators
    return (i,d)

Analysis

To understand why this works, assume that for any pass through the loop the operations in i and d, when applied to a[:x], will produce b[:y]. (This is trivially true when x and y are 0, and i and d are empty.) Therefore:

  • If both lists are exhausted, you’re done.
  • O/w, if the target list is exhausted, a[x] can’t appear in it, and should be deleted. The operations in i and d now synchronize a[:x+1] and b[:y]. Increment x and loop. Note that our loop invariant holds on the next iteration.
  • O/w, if the subject list is exhausted, b[y] can’t appear in it, and should be inserted. The operations in i and d now synchronize a[:x] and b[:y+1]. Increment y and loop. Note that our loop invariant holds on the next iteration.
  • O/w, if a[x] is ordered before b[y], then a[x] cannot appear in b[y:], and should be deleted. The operations in i and d now synchronize a[:x+1] and b[:y]. Increment x and loop. Note that our loop invariant holds on the next iteration.
  • O/w, if a[x] is ordered after b[y], then b[y] cannot appear in a[x:], and must be inserted. The operations in i and d now synchronize a[:x] and b[:y+1]. Increment y and loop. Note that our loop invariant holds on the next iteration.
  • O/w, a[x] and b[y] match, so the operations in i and d actually synchronize a[:x+1] and b[:y+1]. Increment x and y and loop. Note that our loop invariant holds on the next iteration.

Efficiency

I’m not going to present an argument that the operation set produced by this algorithm is really minimal, because that doesn’t seem like it would be fun. It does seem pretty hard to see how any operation produced by this algorithm could be omitted, though.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Python. Bookmark the permalink.