Functional Programming Archaeology

John Backus's Turing Award Lecture from 1977, Can Programming be Liberated from the Von Neumann Style? (warning: large PDF) was a key event in the history of functional programming. All of the ideas in the paper by no means originated with Backus, and Dijkstra publicly criticized it for being poorly thought through, but it did spur interest in functional programming research which eventually led to languages such as Haskell. And the paper is historically interesting as the crystallization of the beliefs about the benefits of functional programming at the time. There are two which jump out at me.

The first is concurrency as a primary motivation. If a program is just a series of side effect-free expressions, then there's no requirement that programs be executed sequentially. In a function call like this:
f(ExpressionA, ExpressionB, ExpressionC)
the three expressions have no interdependencies and can be executed in parallel. This could, in theory, apply all the way down to pieces of expressions. In this snippet of code:
(a + b) * (c + d)
the two additions could be performed at the same time. This fine-grained concurrency was seen as a key benefit of purely functional programming languages, but it fizzled, both because of the difficulty in determining how to parallelize programs efficiently and because it was a poor match for monolithic CPUs.

The second belief which has dropped off the radar since 1977 is the concept of an algebra of programs. Take this simple C expression:
!x
Assuming x is a truth value--either 0 or 1--then !x gives the same result as these expressions:
1 - x x ^ 1 (x + 1) & 1
If the last of these appeared in code, then it could be mechanically translated to one of the simpler equivalents. Going further, you could imagine an interactive tool that would allow substitution of equivalent expressions, maybe even pointing out expressions that can be simplified.

Now in C this isn't all that useful. And in Erlang or Haskell it's not all that useful either, unless you avoid writing explicitly recursive functions with named values and instead express programs as a series of canned manipulations. This is the so-called point-free style which has a reputation for density to the point of opaqueness.

In Haskell code, point-free style is common, but not aggressively so. Rather than trying to work out a way to express a computation as the application of existing primitives, it's usually easier to write an explicitly recursive function. Haskell programmers aren't taught to lean on core primitive functions wherever possible, and core primitive functions weren't necessarily designed with that goal in mind. Sure, there's the usual map and fold and so on, but not a set of functions that would allow 90% of all programs to be expressed as application of those primitives.

Can Programing be Liberated... introduced fp, a language which didn't catch on and left very little in the way of tutorials or useful programming examples. fp was clearly influenced by Ken Iverson's APL, a language initially defined n 1962 (and unlike fp, you can still hunt down production code written in APL). The APL lineage continued after Backus's paper, eventually leading to APL2 and J (both of which involved Iverson) and a second branch of languages created by a friend of Iverson, Arthur Whitney: A+, K, and Q. Viewed in the right light, J is a melding of APL and fp. And the "build a program using core primitives" technique lives on in J.

Here's a simple problem: given an array (or list, if you prefer), return the indices of values which are greater than 5. For example, this input:
1 2 0 6 8 3 9
gives this result:
3 4 6
which means that the elements in the original array at positions 3, 4, and 6 (where the first position is zero, not one) are all greater than 5. I'm using the APl/J/K list notation here, instead of the Haskelly [3,4,6]. How can we transform the original array to 3 4 6 without explicit loops, recursion, or named values?

First, we can find out which elements in the input list are greater than 5. This doesn't give us their positions, but it's a start.
0 2 0 6 8 3 9 > 5 0 0 0 1 1 0 1
The first line is the input, the second the output. Greater than, like most J functions, operates on whole arrays, kind of like all operators in Haskell having map built in. The above example checks if each element of the input array is greater than 5 and returns an array of the results (0 = false, 1 = true).

There's another J primitive that builds a list of values from 0 up to n-1:
i. 5 0 1 2 3 4
Yes, extreme terseness is characteristic of J--just let it go for now. One interesting thing we can do with our original input is to build up a list of integers as long as the array.
i. # 1 2 0 6 8 3 9 0 1 2 3 4 5 6
(# is the length function.) Stare at this for a moment, and you'll see that the result is a list of the valid indices for the input array. So far we've got two different arrays created from the same input: 0 0 0 1 1 0 1 (where a 1 means "greater than 5") and 0 1 2 3 4 5 6 (the list of indices for the array). Now we take a bit of a leap. Pair these two array together: (first element of the first array, first element of the second array), etc., like this:
(0,0) (0,1) (0,2) (1,3) (1,4) (0,5) (1,6)
This isn't J notation; it's just a way of showing the pairings. Notice that if you remove all pairs that have a zero in the first position, then only three pairs are left. And the second elements of those pairs make up the answer we're looking for: 3 4 6. It turns out that J has an operator for pairing up arrays like this, where the first element is a count and the second is a value to repeat count times. Sort of a run-length expander. The key is that a count of zero can be viewed as "delete me" and a count of 1 as "copy me as is." Or in actual J code:
0 0 0 1 1 0 1 # 0 1 2 3 4 5 6 3 4 6
And there's our answer--finally! (Note that # in this case, with an operand on each side of it, is the "expand" function.) If you're ever going to teach a beginning programming course, go ahead and learn J first, so you can remember what it's like to be an utterly confused beginner.

In the APL/J/K worlds, there's a collection of well-known phrases (that is, short sequences of functions) for operations like this, each made up of primitives. It's the community of programmers with the most experience working in a point-free style. Though I doubt those programmers consider themselves to be working with "an algebra of programs," as Backus envisioned, the documentation is sprinkled with snippets of code declared to be equivalent to primitives or other sequences of functions.