The Last Line Effect and Extralinguistic Mechanisms

Since the release of the Racket Manifesto by Felleisen, et al. it has become a thesis of mine that normal programmers face, on a regular basis, real problems which could be alleviated by the sorts of macro mechanisms offered by Racket (and to a lesser extent, other Lisps). I’ve got in mind some in-depth investigations I’d like to perform to really back this argument up, but until I have time for that, a recent article will serve as a sort of teaser for what I’ve got in mind.

Andrey Karpov wrote in the “The Last Line Effect”:

When writing program code, programmers often have to write a series of similar constructs. Typing the same code several times is boring and inefficient. That’s why they use the Copy-Paste method: a code fragment is copied and pasted several times with further editing. Everyone knows what is bad about this method: you risk easily forgetting to change something in the pasted lines and thus giving birth to errors. Unfortunately, there is often no better alternative to be found.

Now let’s speak of the pattern I discovered. I figured out that mistakes are most often made in the last pasted block of code.

Andrey goes on to provide some evidence for his assertion. I think he’s on to something, and it makes sense. You do your pasting, then you set your hands in motion with some repeated delete delete delete type type type down arrow, repeat. By the time you’ve done that three or four times, your mind is numb, your eyes are glazed, and the muscles aren’t being supervised anymore.

I’d like to point out and discuss two of his examples:

Example, Chromium

if (access & FILE_WRITE_ATTRIBUTES)
  output.append(ASCIIToUTF16("\tFILE_WRITE_ATTRIBUTES\n"));
if (access & FILE_WRITE_DATA)
  output.append(ASCIIToUTF16("\tFILE_WRITE_DATA\n"));
if (access & FILE_WRITE_EA)
  output.append(ASCIIToUTF16("\tFILE_WRITE_EA\n"));
if (access & FILE_WRITE_EA)
  output.append(ASCIIToUTF16("\tFILE_WRITE_EA\n"));
break;

They’ve got some bitmasks that are referenced via variables (they’re probably #defines, but it doesn’t matter), and when the relevant bits are set, they’d like to write out the name of the bitmask that was set. Turns out this could actually be implemented as a function:

void check_and_append(access, mask, output, string)
{
  if (access & mask)
    output.append(ASCIIToUTF16(concat("\t", string, "\n")));
}
...
check_and_append(access, FILE_WRITE_ATTRIBUTES,
                 output, "FILE_WRITE_ATTRIBUTES");
...

Hmm. Not really an improvement. We’re still duplicating things. And the repetition of access and output is tedious. I’d expect most programmers to copy and paste the function call. Back where we started.

The C macro system happens to be powerful enough to address this case:

#define C_A(a,o,s) \
  if (a & s) \
      o.append(ASCIIToUTF16("\t" #s "\n"));

C_A(access, output, FILE_WRITE_ATTRIBUTES)

will expand to what we want.

This accomplishes our goal, because the macro is able to see the symbol FILE_WRITE_ATTRIBUTES, rather than just the value of 0x00100 or whatever it might happen to be.

We could shorten the uses of C_A by having the macro definition refer directly to access and output. But sooner or later, that will trip over standard macro hygiene problems.

Example, Multi Theft Auto

CWaterPoly* CWaterManagerSA::CreateQuad (....)
{
  ....
  pInterface->m_wVertexIDs [ 0 ] = pV1->GetID ();
  pInterface->m_wVertexIDs [ 1 ] = pV2->GetID ();
  pInterface->m_wVertexIDs [ 2 ] = pV3->GetID ();
  pInterface->m_wVertexIDs [ 3 ] = pV4->GetID ();
  ....
}

Ok, again, a C macro could help:

#define setter(i) \
  {pInterface->m_wVertexIDs [ (i-1) ] = (pV ## i)->GetID ();}
   ....
   setter(1);
   setter(2);
   setter(3);
   setter(4);
   ....

But, come on, this begs to be in a for loop:

#define setter(i) \
  {pInterface->m_wVertexIDs [ (i-1) ] = (pV ## i)->GetID ();}
   ....
   for(int i=1; i<=4; i++) {
     setter(i);
   }
   ....

And, unsurprisingly, that doesn’t work. Perhaps a more clever C preprocessor could notice that the for loop is static, and generate all of the appropriate code. Maybe there’s even some option I don’t remember floating around that would work with existing preprocessors. Frankly, I’m pleased I remember how to do the stringification and concatenation.

We could write a Racket macro that covers this:

(define-syntax (assign stx)
  (define (trailing-digits expr)
    (string->number
      (regexp-replace* #px"^.*?(\\d+)$"
                       (symbol->string (syntax->datum expr))
                       "\\1")))
  (syntax-case stx ()
    [(_ target source ...)
     (with-syntax ([(i ...)
                    (map trailing-digits
		         (syntax->list #'(source ...)))])
       #'(begin
           (vector-set! target (sub1 i) source)
           ...))]))

That’s not short. But if you’re going to be writing a lot of code where you want to pull values out of variables with numbers in their names, and stuff those values into corresponding positions in arrays, wouldn’t it be nice to have? How many times would you need to use it before it became worth the effort, in order to avoid the copying and pasting?

How much effort would it be worth to eliminate a class of errors?

Finally

Refer back to the bold portion of Andrey’s text: “there is often no better alternative to be found”. What? We’re programming! We’re providing instructions to a machine, which will be faithfully executed on our behalf, regardless of how tedious and repetitive they may be, and there’s no better alternative than copying and pasting?

When the Racket manifesto discusses Racket’s design principles, the authors write:

When programmers must resort to extra-linguistic mechanisms to solve a problem, the chosen language has failed them.

I assert that copying and pasting code is an extra-linguistic mechanism for expressing repetition the language is incapable of. When you are left with no choice but to copy and paste code, your language has failed you.