Wednesday, February 20, 2013

Tree traversal - the Python loop

Computer Science states that every recursive function can be reimplemented with just a loop. So how about rewriting the previous recursive example?

Lets recall our data structure first:

class TreeNode(object):
    def __init__(self, name):
        self.children = list() = name

root = TreeNode("Root")
root.children.extend( (TreeNode("Fruits"), TreeNode("Berries")) )
root.children[0].children.extend( (TreeNode("Apple"), TreeNode("Peach")) )
root.children[1].children.extend( (TreeNode("Cherry"), TreeNode("Mullberry")) )
The task: Find all paths of the tree from root to leafs.

The looping solution

def traverse(node):
    to_crawl = deque()
    lastlevel = curlevel = 0
    to_crawl.append((curlevel, node))
    breadcrumb = list()

    while to_crawl:
        curlevel, curnode = to_crawl.popleft()
        while lastlevel and lastlevel >= curlevel:
            lastlevel -= 1
        yield breadcrumb
        to_crawl.extendleft([ (curlevel+1, c) for c in curnode.children ])
        lastlevel = curlevel
The idea is to use excellent deque data structure, which can be thought of a list that can be efficiently appended/popped from both left and right.

Again, lets use it:

for breadcrumb in traverse(root):
    print breadcrumb
Which will print:
['Root', 'Berries']
['Root', 'Berries', 'Mullberry']
['Root', 'Berries', 'Cherry']
['Root', 'Fruits']
['Root', 'Fruits', 'Peach']
['Root', 'Fruits', 'Apple']
Cautious reader will note that the tree is indeed traversed but in the reversed order. The reason is when one extends the deque from left, and consequently pops out items out (from left again) it creates LIFO order, effectively reversing the children processing order. The fix would be to originally insert the children to deque in the reverse order. So instead of:
to_crawl.extendleft([ (curlevel+1, c) for c in curnode.children ])
we can manually insert children in the reversed order like this:
to_crawl.extendleft([ (curlevel+1, curnode.children[i]) for i in xrange(len(curnode.children)-1, -1, -1) ])


Time: O(n) - same as with recursion.

Memory: O(log n)K + O(log n). First O(log n) for recursion depth, second O(log n) path buffer. About K - its an average number of children per node - our crawling buffer holds reference to all of the children for each node we encounter, so we need to account for that.

Same as with python recursion solution, but even better - no recursion here.

The only disadvantage I can spot is the fact that we hold references to the children of the nodes we crawl. On extremely shallow and wide trees this can raise the memory footprint almost to O(n). The recursive solution however would still provide O(log n) in this extreme case.