Tales of a Former Disassembly Addict

Like many people who learned to program on home computers in the 1980s, I started with interpreted BASIC and moved on to assembly language. I've seen several comments over the years--including one from Alan Kay--less than thrilled with the 8-bit department store computer era, viewing it as a rewind to a more primitive time in programming. That's hard to argue against, as the decade prior to the appearance of the Apple II and Atari 800 had resulted in Modula-2, Smalltalk, Icon, Prolog, Scheme, and some top notch optimizing compilers for Pascal and C. Yet an entire generation happily ignored all of that and became bit-bumming hackers, writing thousands of games and applications directly at the 6502 and Z80 machine level with minimal operating system services.

There wasn't much of a choice.

In Atari BASIC, this statement:
PRINT SIN(1)
was so slow to execute that you could literally say "dum de dum" between pressing the enter key and seeing the result. Assembly language was the only real option if you wanted to get anywhere near what the hardware was capable of. That there was some amazing Prolog system on a giant VAX did nothing to change this. And those folks who had access to that system weren't able to develop fast graphical games that sold like crazy at the software store at the mall.

I came out of that era being very sensitive to what good low-level code looked like, and it was frustrating.

I'd routinely look at the disassembled output of Pascal and C compilers and throw up my hands. It was often as if the code was some contrived example in Zen of Assembly Language, just to show how much opportunity there was for optimization. I'd see pointless memory accesses, places where comparisons could be removed, a half dozen lines of function entry/exit code that wasn't needed.

And it's often still like that, even though the party line is that compilers can out-code most humans. Now I'm not arguing against the overall impressiveness of compiler technology; I remember trying to hand-optimize some SH4 code and I lost to the C compiler every time (my code was shorter, but not faster). But it's still common to see compilers where this results in unnecessarily bulky code:
*p++ = 10; *p++ = 20; *p++ = 30; *p++ = 40;
while this version ends up much cleaner:
p[0] = 10; p[1] = 20; p[2] = 30; p[3] = 40; p += 4;
I noticed that under OS X a few years ago--and this may certainly still be the case with the Snow Leopard C compiler--that every access to a global variable resulted in two fetches from memory: one to get the address of the variable, one to get the actual value.

Don't even get me started about C++ compilers. Take some simple-looking code involving objects and overloaded operators, and I can guarantee that the generated code will be filled with instructions to copy temporary objects all over the place. It's not at all surprising if a couple of simple lines of source turn into fifty or a hundred of assembly language. In fact, generated code can be so ridiculous and verbose that I finally came up with an across-the-board solution which works for all compilers on all systems:

I don't look at the disassembled output.

If you've read just a couple of entries in this blog, you know that I use Erlang for most of my personal programming. As a mostly-interpreted language that doesn't allow data structures to be destructively modified, it's no surprise to see Erlang in the bottom half of any computationally intensive benchmark. Yet I find it keeps me thinking at the right level. The goal isn't to send as much data as possible through a finely optimized function, but to figure out how to have less data and do less processing on it.

In the mid-1990s I wrote a 2D game and an enhanced version of the same game. The original had occasional--and noticeable--dips in frame rate on low-end hardware, even though I had optimized the sprite drawing routines to extreme levels. The enhanced version didn't have the same problem, even though the sprite code was the same. The difference? The original just threw dozens and dozens of simple-minded attackers at the player. The enhanced version had a wider variety of enemy behavior, so the game could be just as challenging with fewer attackers. Or more succinctly: it was drawing fewer sprites.

I still see people obsessed with picking a programming language that's at the top of the benchmarks, and they obsess over the timing results the way I used to obsess over disassembled listings. It's a dodge, a distraction...and it's irrelevant.