programming in the
twenty-first century

It's not about technology for its own sake. It's about being able to implement your ideas.

How I Learned to Stop Worrying and Love Erlang's Process Dictionary

The rise of languages based upon hash tables is one of the great surprises in programming over the last twenty years.

Whatever you call them--hash tables, dictionaries, associative arrays, hash maps--they sure are useful. A majority of my college data structures courses are immediately negated. If you've got pervasive hash table support, then you've also got arrays (just hash tables with integer keys), growable arrays (ditto), sparse arrays (again, same thing), and it's rare to have to bother with binary trees, red-black trees, tries, or anything more complex. Okay, sets are still useful, but they're hash tables where only the keys matter. I'd even go so far as to say that with pervasive hash table support it's unusual to spend time thinking about data structures at all. It's key/value pairs for everything. (And if you need convincing, study Peter Norvig's Sudoku solver.)

If you don't believe the theory that functional programming went mainstream years ago, then at least consider that the mismatch between dictionary-based programming and functional programming has dealt a serious blow to the latter.

Now, sure, it's easy to build a purely functional dictionary in Erlang or Haskell. In fact, such facilities are already there in the standard libraries. It's using them that's clunky. Reading a value out of a dictionary is straightforward enough, but the restrictions of single-assignment, and that "modifying" a hash table returns a new version, calls for a little puzzle solving.

I can write down any convoluted sequence of dictionary operations:

Add the value of the keys "x" and "y" and store them in key "z". If "z" is greater than 100, then also set a key called "overflow" and add the value of "extra" to "x." If "x" is greater than 326, then set "finished" to true, and clear "y" to zero.

and in Python, Ruby, Lua, or Perl, minimal thought is required to write out a working solution. It's just a sequence of operations that mimic their textual descriptions. Here's the Python version, where the dictionary is called "d":

d['z'] = d['x'] + d['y']
if d['z'] > 100:
   d['overflow'] = True
   d['x'] += d['extra']
if d['x'] > 326:
   d['finished'] = True
   d['y'] = 0

I can certainly write the Erlang version of that (and, hey, atoms, so no quotes necessary!), but I can't do it so mindlessly. (Go ahead and try it, using either the dict or gb_trees modules.) I may be able to come up with Erlang code that's prettier than Python in the end, but it takes more work, and a minor change to the problem definition might necessitate a full restructuring of my solution.

Well, okay, no, the previous paragraph isn't true at all. I can bang-out an Erlang version that closely mimics the Python solution. And as a bonus, it comes with the big benefit of Erlang: the hash table is completely isolated inside of a single process. All it takes is getting over the psychological hurdle--and the lecturing from purists--about using the process dictionary. (You can use the ets module to reach the same end, but it takes more effort: you need to create the table first, and you have to create and unpack key/value pairs yourself, among other quirks.)

Let me come right out and say it: it's okay to use the process dictionary.

Clearly the experts agree on this, because every sizable Erlang program I've looked at makes use of the process dictionary. It's time to set aside the rote warnings of how unmaintainable your code will be if you use put and get and instead revel in the usefulness of per-process hash tables. Now what's really being preached against, when the horrors of the process dictionary are spewed forth, is using it as a way to sidestep functional programming, weaving global updates and flags through your code like old-school BASIC. But that's the extreme case. Here's a list of situations where I've found the process dictionary to be useful, starting from the mildest of instances:

Low-level, inherently stateful functions. Random number generation is the perfect example, and not too surprisingly the seed that gets updated with each call lives in the process dictionary.

Storing process IDs. Yes, you can use named processes instead, and both methods keep you from having to pass PIDs around, but named processes are global to the entire Erlang node. Use the process dictionary instead and you can start multiple instances of the same application without conflict.

Write-once process parameters. Think of this as an initial configuration step. Stuff all the settings that will never change within a process in the dictionary. From a programming point of view they're just like constants, so no worries about side effects.

Managing data in a tail recursive server loop. If you done any Erlang coding you've written a tail-recursive server at some point. It's a big receive statement, where each message handling branch ends with a recursive call. If there are six parameters, then each of those calls involves six parameters, usually with one of them changed. If you add a new parameter to the function, you've got to find and change each of the recursive calls. Eventually it makes more sense to pack everything into a data structure, like a gb_tree or ets table. But there's nothing wrong with just using the simpler process dictionary for key/value pairs. It doesn't always make sense (you might want the ability to quickly roll back to a previous state), but sometimes it does.

Handling tricky data flow in high-level code. Sometimes trying to be pure is messy. Nice, clean code gets muddled by having to pass around some data that only gets used in exceptional circumstances. All of a sudden functions have to return tuples intead of simple values. All of a sudden there's tangential data hitchhiking through functions, not being used directly. And when I start going down this road I find myself getting frustrated and annoyed, jumping through hoops to do something that I wouldn't even care about in most languages. Making flags or key elements of data globally accessible, is a huge sigh of relief, and the excess code melts away.

(If you've read this and are horrified that I'm completely misunderstanding functional programming, remember that I've gone further down the functional path than most people for what appear to be state-oriented problems.)

permalink December 9, 2009

previously

archives

twitter / mail

I'm James Hague, a recovering programmer who has been designing video games since the 1980s. Programming Without Being Obsessed With Programming and Organizational Skills Beat Algorithmic Wizardry are good starting points. For the older stuff, try the 2012 Retrospective.

Where are the comments?