Understanding What It's Like to Program in Forth

I write Forth code every day. It is a joy to write a few simple words and solve a problem. As brain exercise it far surpasses cards, crosswords or Sudoku

—Chuck Moore, creator of Forth

I've used and enjoyed Forth quite a bit over the years, though I rarely find myself programming in it these days. Among other projects, I've written several standalone tools in Forth, used it for exploratory programming, wrote a Forth-like language for handling data assets for a commercial project, and wrote two standalone 6502 cross assemblers using the same principles as Forth assemblers.

It's easy to show how beautiful Forth can be. The classic example is:

: square dup * ;

There's also Leo Brodie's oft-cited washing machine program. But as pretty as these code snippets are, they're the easy, meaningless examples, much like the two-line quicksort in Haskell. They're trotted out to show the the strengths of a language, then reiterated by new converts. The primary reason I wrote the Purely Functional Retrogames series, is because of the disconnect between advocates saying everything is easy without destructive updates, and the utter lack of examples of how to approach many kinds of problems in a purely functional way. The same small set of pretty examples isn't enough to understand what it's like to program in a particular language or style.

Chuck Moore's Sudoku quote above is one of the most accurate characterizations of Forth that I've seen. Once you truly understand it, you'll better see what's fun about the language, and also why it isn't as commonly used. What I'd like to do is to start with a trivially simple problem, one that's completely straightforward, even simpler than the infamous FizzBuzz:

Write a Forth word to add together two integer vectors (a.k.a. arrays) of three elements each.

The C version, without bothering to invent custom data types, requires no thought:

void vadd(int *v1, int *v2, int *v3)
{
       v3[0] = v1[0] + v2[0];
       v3[1] = v1[1] + v2[1];
       v3[2] = v1[2] + v2[2];
}

In Erlang it's:

vadd({A,B,C}, {D,E,F}) -> {A+D, B+E, C+F}.

In APL and J the solution is a single character:

first Forth attempt

So now, Forth. We start with a name and stack picture:

: vadd ( v1 v2 v3 -- )

Getting the first value out of v1 is easy enough:

rot dup @

"rot" brings v1 to the top, then we grab the first element of the array (remember that we need to keep v1 around, hence the dup). Hmmm...now we've got four items on the stack:

v2 v3 v1 a

"a" is what I'm calling the first element of v1, using the same letters as in the Erlang function. There's no way to get v2 to the top of the stack, save the deprecated word pick, so we're stuck.

second Forth attempt

Thinking about this a bit more, the problem is we have too many items being dealt with at once, too many items on the stack. v3 sitting there on top is getting in the way, so what if we moved it somewhere else for a while? The return stack is the standard location for a temporary value, so let's try it:

>r over @ over @ + r> !

Now that works. We get v3 out of the way, fetch v1 and v2 (keeping them around for later use), then bring back v3 and store the result. Well, almost, because now v3 is gone and we can't use it for the second and third elements.

third Forth attempt

This isn't as bad as it sounds. We can just keep v3 over on the return stack for the whole function. Here's an attempt at the full version of vadd:

: vadd ( v1 v2 v2 -- )
       >r
       over @ over @ + r@ !
       over cell+ @ over cell+ @ + r@ cell+ !
       over 2 cells + @ over 2 cells + @ + r> 2 cells + !
       drop drop ;

cell+ is roughly the same as ++ in C. "2 cells +" is equivalent to "cell+ cell+". Notice how v3 stays on the return stack for most of the function, being fetched with r@. The "drop drop" at the end is to get rid of v1 and v2. Some nicer formatting helps show the symmetry of this word:

: vadd ( v1 v2 v2 -- )
       >r
       over           @  over           @  + r@           !
       over cell+     @  over cell+     @  + r@ cell+     !
       over 2 cells + @  over 2 cells + @  + r> 2 cells + !
       drop drop ;

This can be made more obvious by defining some vector access words:

: 1st ;
: 2nd cell+ ;
: 3rd 2 cells + ;
: vadd ( v1 v2 v2 -- )
       >r
       over 1st @  over 1st @  + r@ 1st !
       over 2nd @  over 2nd @  + r@ 2nd !
       over 3rd @  over 3rd @  + r> 3rd !
       drop drop ;

A little bit of extra verbosity removes one quirk in the pattern:

: vadd ( v1 v2 v2 -- )
       >r
          over 1st @  over 1st @  + r@ 1st !
          over 2nd @  over 2nd @  + r@ 2nd !
          over 3rd @  over 3rd @  + r@ 3rd !
       rdrop drop drop ;

And that's it--three element vector addition in Forth. One solution at least; I can think of several completely different approaches, and I don't claim that this is the most concise of them. It has some interesting properties, not the least of which is that there aren't any named variables. On the other hand, all of this puzzling, all this revision...to solve a problem which takes no thought at all in most languages. And while the C version can be switched from integers to floating point values just by changing the parameter types, that change would require completely rewriting the Forth code, because there's a separate floating point stack.

Still, it was enjoyable to work this out. Better than Sudoku? Yes.

permalink August 2, 2008

Understanding What It's Like to Program in Forth

first Forth attempt

second Forth attempt

third Forth attempt

previously

archives