I'm James Hague, a recovering programmer who has been designing video games since the 1980s. This is Why You Spent All that Time Learning to Program and The Pure Tech Side is the Dark Side are good places to start.
Where are the comments?
When most programmers hear a mention of Forth, assuming they're familiar with it at all, a series of memory fragments surface: stack, Reverse Polish Notation, SWAPping and DUPlicating values. While the stack and RPN are certainly important to Forth, they don't describe essence of how the language actually works.
As an illustration, let's write a program to decode modern coffee shop orders. Things like:
I'd like a grande skinny latte
Gimme a tall mocha with an extra shot to go
The catch here is that we're not allowed to write a master parser for this, a program that slurps in the sentence and analyzes it for meaning. Instead, we can only look at a single word at a time, starting from the left, and each word can only be examined once--no rewinding.
To get around this arbitrary-seeming rule, each word (like "grande") will have a small program attached to it. Or more correctly, each word is the name of a program. In the second example above, first the program called
gimme is executed, then
tall, and so on.
Now what do each of these programs do? Some words are clearly noise: I'd, like, a, an, to, with. The program for each of these words simply returns immediately. "I'd like a," which is three programs, does absolutely nothing.
Now the first example ("i'd like a grande skinny latte"), ignoring the noise words, is "grande skinny latte." Three words. Three programs.
grande sets a
Size variable to 2, indicating large. Likewise,
tall sets this same variable to 1, and
short sets it to 0. The second program,
skinny, sets a
Use_skim_milk flag to true. The third program,
latte, records the drink name in a variable we'll call
To use a more concise notation, here's a list of the programs for the second example:
When all of these programs have been executed, there's enough data stored in a handful of global variables to indicate the overall drink order, and we managed to dodge writing a real parser. Almost. There still needs to be one more program that looks at
Size and so on. If we name that program
EOL, then it executes after all the other programs, when end-of-line is reached. We can even handle rephrasings of the same order, like "mocha with an extra shot, tall, to go" with exactly the same code.
The process just described is the underlying architecture of Forth: a dictionary of short programs. In Forth-lingo, each of these named programs is called a word. The main loop of Forth is simply an interpreter: read the next bit of text delimited by spaces, look it up in the dictionary, execute the program associated with it, repeat. In fact, even the Forth compiler works like this. Here's a simple Forth definition:
The colon is a word too, and the program attached to it first reads the next word from the input and creates a dictionary entry with that name. Then it does this: read the next word in the input, if the word is a semicolon then generate a return instruction and stop compiling, otherwise look up the word in the dictionary, compile a call to it, repeat.
So where do stacks and RPN come into the picture? Our coffee shop drink parser is simple, but it's a front for a jumble of variables behind the scenes. If you're up for some inelegant code, you could do math with the same approach. "5 + 3" is three words:
but this is clunky and breaks down quickly. A stack is a good way to keep information flowing between words, maybe the best way, but you could create a dictionary-based language that didn't use a stack at all. Each function in Backus's FP, for example, creates a value or data structure which gets passed to the next function in sequence. There's no stack.
Finally, just to show that my fictional notation is actually close to real Forth, here's a snippet of code for the drink decoder: