I'm a recovering programmer who has been designing video games since the 1980s, doing things that seem baroquely hardcore in retrospect, like writing Super Nintendo games entirely in assembly language. These days I use whatever tools are the most fun and give me the biggest advantage.
james.hague @ gmail.com
Where are the comments?
Timings and the PunchlineI forgot two things in Revisiting "Programming as if Performance Mattered": exact timings of the different versions of the code and a punchline. I'll do the timings first.
timer:tcfalls apart once code gets too fast. A classic sign of this is running consecutive timings and getting back a sequence of numbers like 15000, 31000, 31000, 15000. At this point you should write a loop to execute the test function, say, 100 times, then divide the total execution time by 100. This smooths out interruptions for garbage collection, system processes, and so on.
And now the timings (lower is better). The TGA image decoder with the clunky binary / list / binary implementation of
decode_rgb, on the same sample image I used in 2004:
Were I using this module in production code, I'd do one of three things. If I'm only decoding a handful of images here and there, then this whole discussion is irrelevant. The Erlang code is more than fast enough. If image decoding is a huge bottleneck, I'd move the hotspot,
decode_rgbinto a small linked-in driver. Or, and the cries of cheating may be justified here, I'd remove
Remember, transparent pixels runs at the start and end of each row are already detected elsewhere.
decode_rgbblows up the runs in the middle from 24-bit to 32-bit. At some point this needs to be done, but it may just be that it doesn't need to happen at the Erlang level at all. If the pixel data is passed off to another non-Erlang process anyway, maybe for rendering or for printing or some other operation, then there's no reason the compressed 24-bit data can't be passed off directly. That fits the style I've been using for this whole module, of operating on compressed data without a separate decompression step.
But now we're getting into useless territory: quibbling over microseconds without any actual context. You can't feel the difference between any of the optimized versions of the code I presented last time, and so it doesn't matter.