programming in the
twenty-first century

It's not about technology for its own sake. It's about being able to implement your ideas.

Trapped! Inside a Recursive Data Structure

Flat lists are simple. That's what list comprehensions are designed to work with, for example. Code for scanning or transforming a flat list can usually be tail recursive. Once data becomes deep, where elements of a list can contain other lists ad infinitum something changes. It's trivial to iterate over a deep list; any basic Scheme textbook covers this early on. You recurse down, down, down, counting up values, building lists, and then....trapped. You're way down inside a function, and all you really want to do is exit immediately or record some data that applies to the whole nested data structure and keep going, but you can't.

As an example, here's the standard "is X contained in a list?" function written in Erlang:

member(X, [X|_]) -> true;
member(X, [_|T]) -> member(X, T);
member(_, [])    -> false.

Once a match is found, that's it. A value of true is returned. A function to find X in a deep list takes a bit more work:

member(X, [X|_]) ->
   true;
member(X, [H|T]) when is_list(H) ->
   case member(X, H) of
      true -> true;
      _    -> member(X, T)
   end;
member(X, [_|T]) ->
   member(X, T);
member(X, []) ->
   false.

The ugly part here is that you could be down 50 levels in a deep list when a match is found, but you're trapped. You can't just immediately stop the whole operation and say "Yes! Done!" You've got to climb back up those 50 levels. That's the reason for checking for "true" in the second function clause. Now this example is mild in terms of claustrophobic trappage, but it can be worse, and you'll know it when you run into such a case.

There are a couple of options here. One is to throw an exception. Another is to use continuation passing style. But there's a third approach which I think is cleaner: manage a stack yourself instead of using the function call stack. This keeps the function tail recursive, making it easy to exit or handle counters or accumulators across the whole deep data structure.

Here's member for deep lists written with an explicit stack:

member(X, L) -> member(X, L, []).
member(X, [X|_], _Stack) ->
   true;
member(X, [H|T], Stack) when is_list(H) ->
   member(X, H, [T|Stack]);
member(X, [_|T], Stack) ->
   member(X, T, Stack);
member(_, [], []) ->
   false;
member(X, [], [H|T]) ->
   member(X, H, T).

Whenever the head of the list is a list itself, the tail is pushed onto Stack so it can be continued with later, and the list is processed. When there's no more input, check to see if Stack has any data on it. If so, pop the top item and make it the current list. When a match is found, the exit is immediate, because there aren't any truly recursive calls to back out of.

Would I really write member like this? Probably not. But I've found more complex cases where this style is much less restrictive than writing a truly recursive function. One of the signs that this might be useful is if you're operating across a deep data structure as a whole. For example, counting the number of atoms in a deep list. Or taking a deep data structure and transforming into into one that's flat.

permalink December 1, 2007

previously

archives

twitter / mail

I'm James Hague, a recovering programmer who has been designing video games since the 1980s. Programming Without Being Obsessed With Programming and Organizational Skills Beat Algorithmic Wizardry are good starting points. For the older stuff, try the 2012 Retrospective.

Where are the comments?