<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Programming in the 21st Century</title>
<link rel="self" href="http://prog21.dadgum.com/atom.xml" />
<link rel="alternate" href="http://prog21.dadgum.com/" />
<id>http://prog21.dadgum.com/</id>
<updated>2009-06-15T00:00:00-06:00</updated>
<entry>
<title>How to Crash Erlang</title>
<link rel="alternate" type="text/html" href="http://prog21.dadgum.com/43.html" />
<id>http://prog21.dadgum.com/43.html</id>
<published>2009-06-15T00:00:00-06:00</published>
<updated>2009-06-15T00:00:00-06:00</updated>
<author><name>James Hague</name></author>
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Now that's a loaded title, and I know some people will immediately see it as a personal slam on Erlang or ammunition for berating the language in various forums.  I mean neither of these.  Crashing a particular language, even so-called safe interpreted implementations, is not particularly challenging.  Running out of memory or stack space are two easy options that work for most languages.  There are pathological cases for regular expressions that may not truly crash, but result in such an extended period of unresponsiveness on large data sets that the difference is moot.  In any language that allows directly linking to arbitrary operating system functions...well, that's just too easy.
<br/><br/>Erlang, offering more complex features than many languages, has some particularly interesting edge cases.
<br/><br/><b>Run out of atoms.</b>  Atoms in Erlang are analogous to symbols in Lisp--that is, symbolic, non-string identifiers that make code more readable, like <tt>green</tt> or <tt>unknown_value</tt>--with one exception.  Atoms in Erlang are not garbage collected.  Once an atom has been created, it lives as long as the Erlang node is running.  An easy way to crash the Erlang virtual machine is to loop from 1 to some large number, calling <tt>integer_to_list</tt> and then <tt>list_to_atom</tt> on the current loop index.  The atom table will fill up with unused entries, eventually bringing the runtime system to halt.
<br/><br/>Why is this is allowed?  Because garbage collecting atoms would involve a pass over all data in all processes, something the garbage collector <a href="16.html">was specifically designed</a> to avoid.  And in practice, running out of atoms will only happen if you write code that's generating new atoms on the fly.
<br/><br/><b>Run out of processes.</b>  Or similarly, "run out of memory because you've spawned so many processes."  While the sequential core of Erlang leans toward being purely functional, the concurrent side is decidedly imperative.  If you spawn a non-terminating, unlinked process, and manage to lose the process id for it, then it will just sit there, waiting forever.  You've got a process leak.
<br/><br/><b>Flood the mailbox for a process.</b>  This is something that most new Erlang programmers do sooner or later.  One process sends messages to another process without waiting for a reply, and a missing or incorrect pattern in the <tt>receive</tt> statement causes the receiver to ignore all messages...so they keep piling up until the mailbox fills all available memory, and that's that.  Another reminder that concurrency in Erlang is imperative.
<br/><br/><b>Create too many large binaries in a single process.</b>  Large--greater than 64 byte--binaries are allocated outside of the per-process heap and are reference counted.  The catch is that the reference count indicates how many <i>processes</i> have access to the binary, not how many different pointers there are to it within a process.  That makes the runtime system simpler, but it's not bulletproof.  When garbage collection occurs for a process, unreferenced binaries are deleted, but that's only when garbage collection occurs.  It's possible to create a large process with a slowly growing heap, and create so much binary garbage that the system runs out of memory before garbage collection occurs.  Unlikely, yes, but possible.
</div></content>
</entry>
<entry>
<title>Digging Deeper into Sufficiently Smartness</title>
<link rel="alternate" type="text/html" href="http://prog21.dadgum.com/42.html" />
<id>http://prog21.dadgum.com/42.html</id>
<published>2009-06-14T00:00:00-06:00</published>
<updated>2009-06-14T00:00:00-06:00</updated>
<author><name>James Hague</name></author>
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">(If you haven't read <a href="40.html">On Being Sufficiently Smart</a>, go ahead and do so, otherwise this short note won't have any context.)
<br/><br/>I frequently write Erlang code that builds a list which ends up backward, so I call <tt>lists:reverse</tt> at the very end to flip it around.  This is a common idiom in functional languages.
<br/><br/><tt>lists:reverse</tt> is a built-in function in Erlang, meaning it's implemented in C, but for the sake of argument let's say that it's written in Erlang instead.  This is super easy, so why not?
<pre>reverse(L) -&gt; reverse(L, []).
reverse([H|T], Acc) -&gt;
   reverse(T, [H|Acc]);
reverse([], Acc) -&gt;
   Acc.
</pre>Now suppose there's another function that uses <tt>reverse</tt> at the very end, just before returning:
<pre>collect_digits(L) -&gt; collect_digits(L, []).
collect_digits([H|T], Acc) when H &gt;= $0, H =&lt; $9 -&gt;
   collect_digits(T, [H|Acc]);
collect_digits(_, Acc) -&gt;
   reverse(Acc).
</pre>This function returns a list of ASCII digits that prefix a list, so <tt>collect_digits("1234.0")</tt> returns <tt>"1234"</tt>.  And now one more "suppose": suppose that one time we decide that we really need to process the result of <tt>collect_digits</tt> backward, so we do this:
<pre>reverse(collect_digits(List))
</pre>The question is, can the compiler detect that there's a double reverse?  In theory, the last <tt>reverse</tt> could be dropped from <tt>collect_digits</tt> in the generated code, and each call to <tt>collect_digits</tt> could be automatically wrapped in a call to <tt>reverse</tt>.  If there ends up being two calls to <tt>reverse</tt>, then get rid of both of them, because it's just wasted effort to double-reverse a list.
<br/><br/>With <tt>lists:reverse</tt> as a built-in, this is easy enough.  But can it be deduced simply from the raw source code that <tt>reverse(reverse(List))</tt> can be replaced with <tt>List</tt>?  Is that effort easier than simply special-casing the list reversal function?
</div></content>
</entry>
<entry>
<title>Let's Take a Trivial Problem and Make it Hard</title>
<link rel="alternate" type="text/html" href="http://prog21.dadgum.com/41.html" />
<id>http://prog21.dadgum.com/41.html</id>
<published>2009-05-04T00:00:00-06:00</published>
<updated>2009-05-04T00:00:00-06:00</updated>
<author><name>James Hague</name></author>
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Here's a simple problem:  Given a block of binary data, count the frequency of the bytes within it.  In C, this could be a homework assignment for an introductory class.  Just zero out an array of 256 elements, then for each byte increment the appropriate array index.  Easy.
<br/><br/>Now write this in a purely functional way, with an efficiency close to that of the C implementation.
<br/><br/>It's easy to do a straightforward translation to Erlang, using tail recursion instead of a <tt>for</tt> loop, like this:
<pre>freq(B) when is_binary(B) -&gt;
   freq(B, erlang:make_tuple(256, 0)).

freq(&lt;&lt;X, Rest/binary&gt;&gt;, Totals) -&gt;
   I = X + 1,
   N = element(I, Totals),
   freq(Rest, setelement(I, Totals, N + 1));
freq(&lt;&lt;&gt;&gt;, Totals) -&gt;
   Totals.
</pre>But of course in the name of purity and simplicity, <tt>setelement</tt> copies the entire <tt>Totals</tt> tuple, so if there are fifty million bytes, then the 256 element <tt>Totals</tt> is copied 50 million times.  It's simple, but it's not the right approach.
<br/><br/>"Blame the complier" is another easy option.  If it could be determined that the <tt>Totals</tt> tuple can be destructively updated, then we're good.  Note that the garbage collector in the Erlang runtime is based on the assumption that pointers in the heap always point toward older data, an assumption that could break if a tuple was destructively updated with, say, a list value.  So not only would the compiler have to deduce that that the tuple is only used locally, but it was also have to verify that only non-pointer values (like integers and atoms) were being passed in as the third parameter of <tt>setelement</tt>.  This is all possible, but it doesn't currently work that way, so this line of reasoning is a dead end for now.
<br/><br/><tt>Totals</tt> could be switched from a tuple to a tree, which might or might not be better than the <tt>setelement</tt> code, but there's no way it's in the same ballpark as the C version.
<br/><br/>What about a different algorithm?  Sort the block of bytes, then count runs of identical values.  Again, just the suggestion of sorting means we're already off track.
<br/><br/>Honestly, I don't know the right answer.  In Erlang, I'd go for one of the imperative efficiency hacks, like ets tables, but let's back up a bit.  The key issue here is that there are some fundamental assumptions about what "purely functional" means and the expected features in functional languages.
<br/><br/>In array languages, like <a href="http://www.jsoftware.com">J</a>, this type of problem is less awkward, as it's closer to what they were designed for.  If nothing else, reference counted arrays make it easier to tell when destructive updates are safe.  And there's usually some kind of classification operator, one that would group the bytes by value for easy counting. That's still not going to be as efficient as C, but it's clearly higher-level than the literal Erlang translation.
<br/><br/>A more basic question is this: "Is destructively updating a local array a violation of purely functionalness?"  OCaml allows destructive array updates and C-like control structures.  If a local array is updated inside of an OCaml function, then the result copied to a non-mutable array at the end, is there really anything wrong with that?  It's not the same as randomly sticking your finger inside a global array somewhere, causing a week's worth of debugging.  In fact, it looks <i>exactly the same as the purely functional version</i> from the caller's point of view.
<br/><br/>Perhaps the sweeping negativity about destructive updates is misplaced.
<br/><br/></div></content>
</entry>
</feed>