I'm a recovering programmer who has been designing video games since the 1980s, doing things that seem baroquely hardcore in retrospect, like writing Super Nintendo games entirely in assembly language. These days I use whatever tools are the most fun and give me the biggest advantage.
james.hague @ gmail.com
Where are the comments?
Optimization in the Twenty-First CenturyI know, I know, don't optimize. Reduce algorithmic complexity and don't waste time on low-level noise. Or embrace the low-level and take advantage of magical machine instructions rarely emitted by compilers. Most of the literature on optimization focuses on these three recommendations, but in many cases they're no longer the best place to start. Gone are the days when you could look like a superstar by replacing long, linear lookups with a hash table. Everyone is already using the hash table from the get-go, because it's so easy.
And yet developers are still having performance problems, even on systems that are hundreds, thousands, or even over a hundred-thousand times than faster those which came before. Here's a short guide to speeding up applications in the modern world.
Get rid of the code you didn't need to write in the first place. Early programming courses emphasize writing lots of code, not avoiding it, and it's a hard habit to break. The first program you ever wrote was something like "Hello World!" It should have looked like this:
Fix that one big, dumb thing. There are some performance traps that look like everyday code, but can absorb hundreds of millions--or more--cycles. Maybe the most common is a function that manipulates long strings, adding new stuff to the end inside a loop. But, uh-oh, strings are immutable, so each of these append operations causes the entire multi-megabyte string to be copied.
It's also surprisingly easy to cause the CPU and GPU to become unintentionally synchronized, where one is waiting for the other. This is why reducing the number of times you hand-off vertex data to OpenGL or DirectX is a big deal. Sending a lone triangle to the GPU can be as expensive as rendering a thousand triangles. A more obscure gotcha is that writing to an OpenGL vertex buffer you've already sent off for rendering will stall the CPU until the drawing is complete.
Shrink your data. Smallness equals performance on modern hardware. You'll almost always win if you take steps to reduce the size of your data. More fits into cache. The garbage collector has less to trace through and copy around. Can you represent a color as an RGB tuple instead of a dictionary with the named elements "red", "green", and "blue"? Can you replace a bulky structure containing dozens of fields with a simpler, symbolic representation? Are you duplicating data that you could trivially compute from other values?
As an aside, the best across-the-board compilation option for most C/C++ compilers is "compile for size." That gets rid of optimizations that look good in toy benchmarks, but have a disproportionately high memory cost. If this saves you 20K in a medium-sized program, that's way more valuable for performance than any of those high-end optimizations would be.
Concurrency often gives better results than speeding up sequential code. Imagine you've written a photo editing app, and there's an export option where all the filters and lighting adjustments get baked into a JPEG. It takes about three seconds, which isn't bad in an absolute sense, but it's a long time for an interactive program to be unresponsive. With concerted effort you can knock a few tenths of a second off that, but the big win comes from realizing that you don't need to wait for the export to complete before continuing. It can be handled in a separate thread that's likely running on a different CPU core. To the user, exporting is now instantaneous.
(If you liked this, you might enjoy Use and Abuse of Garbage Collected Languages.)