I'm a recovering programmer who has been designing video games since the 1980s, doing things that seem baroquely hardcore in retrospect, like writing Super Nintendo games entirely in assembly language. These days I use whatever tools are the most fun and give me the biggest advantage.
james.hague @ gmail.com
Where are the comments?
On the Perils of Benchmarking Erlang2007 brought a lot of new attention to Erlang, and with that attention has come a flurry of impromptu benchmarks. Benchmarks are tricky to write if you're new to a language, because it's easy for the run-time to be dominated by something quirky and unexpected. Consider a naive Python loop that appends data to a string each iteration. Strings are immutable in Python, so each append causes the entire string created thus far to be copied. Here's my short, but by no means complete, guide to pitfalls in benchmarking Erlang code.
Startup time is slow. Erlang's startup time is more significant than with the other languages I use. Remember, Erlang is a whole system, not just a scripting language. A suite of modules are loaded by default; modules that make sense in most applications. If you're going to run small benchmarks, the startup time can easily dwarf your timings.
Garbage collection happens frequently in rapidly growing processes. An Erlang process starts out very small, to keep the overall memory footprint low in a system with potentially tens of thousands of processes. Once a process heap is full, it gets promoted to a larger size. This involves allocating a new block of memory and copying all live data over to it. Eventually process heap size will stabilize, and the system automatically switches a process over to a generational garbage collector at some point too, but during that initial burst of growing from a few hundred words to a few hundred kilowords, garbage collection happens numerous times.
To get around this, you can start a process with a specific heap size using
min_heap_sizeoption lets you choose an initial heap size in words. Even a value of 32K can significantly improve the timings of some benchmarks. No need to worry about getting the size exactly right, because it will still be automatically expanded as needed.
Line-oriented I/O is slow. Sadly, yes, and Tim Bray found this out pretty early on. Here's to hoping it's better in the future, but in the meantime any line-oriented benchmark will be dominated by I/O. Use
file:read_fileto load the whole file at once, if you're not dealing with gigabytes of text.
The more functions exported from a module, the less optimization potential. It's common (and perfectly reasonable) to put:
Inlining is off by default. I doubt you'll ever see big speedups from this, but it's worth adding
Large loop indices use bignum math. A "small" integer in Erlang fits into a single word, including the tag bits. I can never remember how many bits are needed for the tag, but I think it's two. (BEAM uses a staged tagging scheme so key types use fewer tag bits.) If a benchmark has an outer loop counting down from ten billion to zero, then bignum math is used for most of that range. "Bignum" means that a value is larger than will fit into a single machine word, so math involves looping and manually handling some things that an
addinstruction automatically takes care of. Perhaps more significantly, each bignum is heap allocated, so even simple math like
X + 1where
Xis a bignum, causes the garbage collector to kick in more frequently.