Coding as Performance

I want to talk about performance coding. Not coding for speed, but coding as performance, a la live coding. Okay, I don't really want to talk about that either, as it mostly involves audio programming languages used for on-the-fly music composition, but I like the principle of it: writing programs very quickly, in the timescale of TV show or movie rather than the years it can take to complete a commercial product. Take any book on agile development or extreme programming and replace "weeks" with "hours" and "days" with "minutes."

Think of it in terms of a co-worker or friend who comes to you with a problem, something that could be done by hand, but would involve much repetitive work ("I've got a big directory tree, and I need a list of the sum total sizes of all files with the same root names, so hello.txt, hello.doc, and hello.whatever would just show in the report as 'hello', followed by the total size of those three files"). If you can write a program to solve the problem in less time than the tedium of slogging through the manual approach, then you win. There's no reason to limit this game to this kind of problem, but it's a starting point.

Working at this level, the difference between gut instinct and proper engineering becomes obvious. The latter always seems to involve additional time--architecture, modularity, code formatting, interface specification--which is exactly what's in short supply in coding as performance. Imagine you want to plant a brand new vegetable garden somewhere in your yard, and the first task is to stake out the plot. Odds are good that you'll be perfectly successful by just eyeballing it, hammering a wooden stake at one corner, and using it as a reference. Or you could be more formal and use a tape measure. The ultimate, guaranteed correct solution is to hire a team of surveyors to make sure the distances are exact and the sides perfectly parallel. But really, who would do that?

(And if you're thinking "not me," consider people like myself who've grepped a two-hundred megabyte XML file, because it was easier than remembering how to use the available XML parsing libraries. If your reaction is one of horror because I clearly don't understand the whole purpose of using XML to structure data, then there you go. You'd hire the surveyors.)

You can easily spot the programming languages designed for projects operating on shorter timescales. Common, non-trivial operations are built-in, like regular expressions and matrix math (as an aside, the original BASIC language from the 1960s had matrix operators). Common functions--reading a file, getting the size of a file--don't require importing libraries after you've managed to remember that getting the size of a file isn't a core operation that's in the "file" library and is instead in "os:file:filesize" or wherever the hierarchical-thinking author put it. But really, any language of the Python or Ruby class is going to be fine. The big wins are having an interactive read / evaluate / print loop, zero compilation time, and data structures that don't require thinking about low-level implementation details.

What matter just as much are visualization tools, so you can avoid the classic pitfall of engineering something for weeks or months only to finally realize that you didn't understand the problem and engineered the wrong thing. (Students of Dijkstra are ready with some good examples of math problems where attempting to guess an answer based on a drawing gives hopelessly incorrect answers, but I'll pretend I don't see them, there in the back, frantically waving their arms.)

I once used an 8-bit debugger with an interrupt-driven display. Sixty times per second, the display was updated. This meant that memory dumps were live. If a running program constantly changed a value, that memory location showed as blurred digits on the screen. You could also see numbers occasionally flick from 0 to 255, then back later. Static parts of the screen meant nothing was changing there. This sounds simple, but wow was it useful for accidentally spotting memory overruns and logic errors. I often never suspected a problem, and I wouldn't haven even known what to look for, but found an error just by seeing movement or patterns in a memory dump that didn't look right.

A modern visualization tool I can't live without is RegEx Coach. I always try out regular expressions using it before copying them over to my Perl or Python scripts. When I make an error, I see it right away. That prevents situations where the rest of my program is fine, but a botched regular expression isn't pulling in exactly the data I'm expecting.

The J language ships with some great visualization tools. Arguably it's the nicest programming environment I've ever used, even though I go back and forth about whether J itself is brilliant or insane. There's a standard library module which takes a matrix and displays it as a grid of colors. Identical values use the same color. Simplistic? Yes. But this display format makes patterns and anomalies jump out of the screen. If you're thinking that you don't write code that involves matrix math, realize that matrices are native to J and you can easily put all sorts of data into a matrix format (in fact, the preferred term for a matrix in J is the more casual "table").

J also has a similar tool that mimics a spreadsheet display. Pass in data, and up pops what looks like an Excel window, making it easy to view data that is naturally columnar. It's easier than dumping values to an HTML file or the old-fashioned method of debug printing a table using a fixed-width font. There's also an elaborate module for graphing data; no need to export it to a file and use a standalone program.

I'm hardly suggesting that everyone--or anyone--switch over to J. It's not the language semantics that matter so much as tools that are focused on interactivity, on working through problems quickly. And the realization that it is valid to get an answer without always bringing the concerns of software engineering--and the time penalty that comes with them--into the picture.

permalink May 31, 2008

Coding as Performance

previously

archives