I am embarrassed that after writing 72,650 words about MLX 2.0 for last week, I left something out. Specifically, I didn’t include code for your own simulation of the checksum routine on a more modern platform. Here’s a function that carries out the calculations of the Commodore 64/128 or Apple II versions of MLX 2.0. It’s written in Octave, the open-source Matlab-like numerical computation routine. If you can read this, though, you can translate it to whatever language you find convenient.

function [retval] = mlxII (oneline)
z2 = 2;
z4 = 254;
z5 = 255;
z6 = 256;
z7 = 127;
address = oneline(1);
entries = oneline(2:9);
checksum = oneline(10);
ck = 0;
ck = floor(address/z6);
ck = address-z4*ck + z5*(ck>z7)*(-1);
ck = ck + z5*(ck>z5)*(-1);
#
# This looks like but is not the sum mod 255.
# The 8-bit computers did not have a mod function and
# used this subtraction instead.
#
for i=1:length(entries),
ck = ck*z2 + z5*(ck>z7)*(-1) + entries(i);
ck = ck + z5*(ck>z5)*(-1);
endfor
#
# The checksum *can* be 255 (0xFF), but not 0 (0x00)!
# Using the mod function could make zeroes appear
# where 255's should.
#
retval = (ck == checksum);
endfunction

This reproduces the code as it was actually coded. Here’s a version that relies on Octave or Matlab’s ability to use modulo operations:

function [retval] = mlxIIslick (oneline)
factors = 2.^(7:-1:0);
address = oneline(1);
entries = oneline(2:9);
checksum = oneline(10);
ck = 0;
ck = mod(address - 254*floor(address/256), 255);
ck = ck + sum(entries.*factors);
ck = mod(ck, 255);
ck = ck + 255*(ck == 0);
retval = (ck == checksum);
endfunction

Enjoy! Please don’t ask when I’ll have the Automatic Proofreader solved.

A couple months ago I worked out a bit of personal curiosity. This was about how MLX worked. MLX was a program used in Compute! and Compute!’s Gazette magazine in the 1980s, so that people entering machine-language programs could avoid errors. There were a lot of fine programs, some of them quite powerful, free for the typing-in. The catch is this involved typing in a long string of numbers, and if any were wrong, the program wouldn’t work.

So MLX, introduced in late 1983, was a program to make typing in programs better. You would enter in a string of six numbers — six computer instructions or data — and a seventh, checksum, number. Back in January I worked out finally what the checksum was. It turned out to be simple. Take the memory location of the first of your set of six instructions, modulo 256. Add to it each of the six instructions, modulo 256. That’s the checksum. If it doesn’t match the typed-in checksum, there’s an error.

There’s weaknesses to this, though. It’s vulnerable to transposition errors: if you were supposed to type in 169 002 and put in 002 169 instead, it wouldn’t be caught. It’s also vulnerable to casual typos: 141 178 gives the same checksum as 142 177.

Which is all why the original MLX lasted only two years.

What Was The New MLX?

The New MLX, also called MLX 2.0, appeared first in the June 1985 Compute!. This in a version for the Apple II. Six months later a version for the Commodore 64 got published, again in Compute!, though it ran in Compute!’s Gazette too. Compute! was for all the home computers of the era; Compute!’s Gazette specialized in the Commodore computers. I would have sworn that MLX got adapted for the Atari eight-bit home computers too, but can’t find evidence it ever was. By 1986 Compute! was phasing out its type-in programs and didn’t run much for Atari anymore.

The new MLX made a bunch of changes. Some were internal, about how to store a program being entered. One was dramatic in appearance. In the original MLX people typed in decimal numbers, like 32 or 169. In the new, they would enter hexadecimal digits, like 20 or A9. And a string of eight numbers on a line, rather than six. This promised to save our poor fingers. Where before we needed to type in 21 digits to enter six instructions, now we needed 18 digits to enter eight instructions. So the same program would take about two-thirds the number of keystrokes. A plausible line of code would look something like:

I had a Commodore 64, so I always knew MLX from its Commodore version. The key parts of the checksum code appear in it in lines 350 through 390. Let me copy out the key code, spaced a bit out for easier reading:

360 A = INT(AD/Z6):
GOSUB 350:
A = AD - A*Z6:
GOSUB 350:
PRINT":";
370 CK = INT(AD/Z6):
CK = AD - Z4*CK + Z5*(CK>27):
GOTO 390
380 CK = CK*Z2 + Z5*(CK>Z7) + A
390 CK = CK + Z5*(CK>Z5):
RETURN

Z2, Z4, Z5, Z6, and Z7 are constants, defined at the start of the program. Z4 equals 254, Z5 equals 255, Z6 equals 256, and Z7, as you’d expect, is 127. Z2, meanwhile, was a simple 2.

A bit of Commodore BASIC here. INT means to take the largest whole number not larger than whatever’s inside. AD is the address of the start of the line being entered. CK is the checksum. A is one number, one machine language instruction, being put in. GOSUB, “go to subroutine”, means to jump to another line and execute commands from there, and then RETURN. That’s the command. The program then continues from the next instruction after the GOSUB. In this code, line 350 converts a number from decimal to hexadecimal and prints out the hexadecimal version. This bit about adding Z5 * (CK>Z7) looks peculiar.

Commodore BASIC evaluates logical expressions like CK > 27 into a bit pattern. That pattern looks like a number. We can use it like an integer. Many programming languages do something like that and it can allow for clever but cryptic programming tricks. An expression that’s false evaluates as 0; an expression that’s true evaluates as -1. So, CK + Z5*(CK>Z5) is an efficient little filter. If CK is smaller than Z5, it’s left untouched. If CK is larger than Z5, then subtract Z5 from CK. This keeps CK from being more than 255, exactly as we’d wanted.

But you also notice: this code makes no sense.

Like, starting the checksum with something derived from the address makes sense. Adding to that numbers based on the instructions makes sense. But the last instruction of line 370 is a jump straight to line 390. Line 380, where any of the actual instructions are put into the checksum, never gets called. Also, there’s eight instructions per line. Why is only one ever called?

And this was a bear to work out. One friend insisted I consider the possibility that MLX was buggy and nobody had found the defect. I could not accept that, not for a program that was so central to so much programming for so long. Also, not considering that it worked. Make almost any entry error and the checksum would not match.

Where’s the rest of the checksum formula?

This is what took time! I had to go through the code and find what other lines call lines 360 through 390. There’s a hundred lines of code in the Commodore version of MLX, which isn’t that much. They jump around a lot, though. By my tally 68 of these 100 lines jump to, or can jump to, something besides the next line of code. I don’t know how that compares to modern programming languages, but it’s still dizzying. For a while I thought it might be a net saving in time to write something that would draw a directed graph of the program’s execution flow. It might still be worth doing that.

The checksum formula gets called by two pieces of code. One of them is the code when the program gets entered. MLX calculates a checksum and verifies whether it matches the ninth number entered. The other role is in printing out already-entered data. There, the checksum doesn’t have a role, apart from making the on-screen report look like the magazine listing.

Here’s the code that calls the checksum when you’re entering code:

440 POKE 198,0:
GOSUB 360:
IF F THEN PRINT IN$ PRINT" ";
[ many lines about entering your data here ]
560 FOR I=1 TO 25 STEP 3:
B$ = MID$(IN$, I):
GOSUB 320:
IF I<25 THEN GOSUB 380: A(I/3)=A
570 NEXT:
IF ACK THEN GOSUB 1060:
PRINT "ERROR: REENTER LINE ":
F = 1:
GOTO 440
580 GOSUB 1080:
[ several more lines setting up a new line of data to enter ]

Line 320 started the routine that turned a hexadecimal number, such as 7F, into decimal, such as 127. It returns this number as the variable named A. IN$ was the input text, part of the program you you enter. This should be 27 characters long. A(I/3) was an element in an array, the string of eight instructions for that entry. Yes, you could use the same name for an array and for a single, unrelated, number. Yes, this was confusing.

But here’s the logic. Line 440 starts work on your entry. It calculates the part of the checksum that comes from the location in memory that data’s entered in. Line 560 does several bits of work. It takes the entered instructions and converts the strings into numbers. Then it takes each of those instruction numbers and adds its contribution to the checksum. Line 570 compares whether the entered checksum matches the computed checksum. If it does match, good. If it doesn’t match, then go back and re-do the entry.

The code for displaying a line of your machine language program is shorter:

630 GOSUB 360:
B = BS + AD - SA;
FOR I = B TO B+7:
A = PEEK(I):
GOSUB 350:
GOSUB 380:
PRINT S$;
640 NEXT:
PRINT "";
A = CK:
GOSUB 350:
PRINT

The bit about PEEK is looking into the buffer, which holds the entered instructions, and reading what’s there. The GOSUB 350 takes the number ‘A’ and prints out its hexadecimal representation. GOSUB 360 calculates the part of the checksum that’s based on the memory location. The GOSUB 380 contributes the part based on every instruction. S$ is a space. It’s used to keep all the numbers from running up against each other.

So what is the checksum formula?

The checksum takes in two parts. The first part is based on the address at the start of the line. Let me call that the number . The second part is based on the entry, the eight instructions following the line. Let me call them through . So this is easiest described in two parts.

The base of the checksum, which I’ll call , is:

For example, suppose the address is 49152 (in hexadecimal, C000), which was popular for Commodore 64 programming. Then would be 129. If the address is 2049 (in hexadecimal, 0801), another popular location, latex ck_{0} $ increases by 1 as the memory address for the start of a line increases. If you entered a line that started at memory address 49153 (hexadecimal C001) for some reason, that would be 130. A line which started at address 49154 (hexadecimal C002) would have start at 131. This progression continues until would reach 256. Then that greater-than filter at the end of the expression intrudes. A line starting at memory address 49278 (C07E) has of 255, and one starting at memory address 49279 (C07F) has of 1. I see reason behind this choice.

That’s the starting point. Now to use the actual data, the eight pieces through that are the actual instructions. The easiest way for me to describe this is do it as a loop, using to calculate , and to define and so on.

That is, for each piece of data in turn, double the existing checksum and add the next data to it. If this sum is 256 or larger, subtract 255 from it. The working sum never gets larger than 512, thanks to that subtract-255-rule after the doubling. And then again that subtract-255-rule after adding . Repeat through the eighth piece of data. That last calculated checksum, , is the checksum for the entry. If does match the entered checksum, go on to the next entry. If does not match the entered checksum, give a warning and go back and re-do the entry.

Why was MLX written like that?

There are mysterious bits to this checksum formula. First is where it came from. It’s not, as far as I can tell, a standard error-checking routine, or if it is it’s presented in a form I don’t recognize. But I know only small pieces of information theory, and it might be that this is equivalent to a trick everybody knows.

The formula is, at heart, “double your working sum and add the next instruction, and repeat”. At the end, take the sum modulo 255 so that the checksum is no more than two hexadecimal digits. Almost. In studying the program I spent a lot of time on a nearly-functionally-equivalent code that used modulo operations. I’m confident that if Apple II and Commodore BASIC had modulo functions, then MLX would have used them.

But those eight-bit BASICs did not. Instead the programs tested whether the working checksum had gotten larger than 255, and if it had, then subtracted 255 from it. This is a little bit different. It is possible for a checksum to be 255 (hexadecimal FF). This even happened. In the June 1985 Compute!, introducing the new MLX for the Apple II, we have this entry as part of the word processor Speedscript 3.0 that anyone could type in:

0848: 20 A9 00 8D 53 1E A0 00 FF

What we cannot have is a checksum of 0. (Unless a program began at memory location 0, and had instructions of nothing but 0. This would not happen. The Commodore 64, and the Apple II, used those low-address memory locations for system work. No program could use them.) Were the formulas written with modulo operations, we’d see 00 where we should see FF.

Doubling the working sum and then setting it to be in a valid range — from 1 to 255 — is easy enough. I don’t know how the designer settled on doubling, but have hypotheses. It’s a good scheme for catching transposition errors, entering 20 FF D2 where one means to enter 20 D2 FF.

The initial seems strange. The equivalent step for the original MLX was the address on which the entry started, modulo 256. Why the change?

My hypothesis is this change was to make it harder to start typing in the wrong entry. The code someone typed in would be long columns of numbers, for many pages. The text wasn’t backed by alternating bands of color, or periodic breaks, or anything else that made it harder for the eye to skip one or more lines of machine language code.

In the original MLX, skipping one line, or even a couple lines, can’t go undetected. The original MLX entered six pieces of data at a time. If your eye skips a line, the wrong data will mismatch the checksum by 6, or by 12, or by 18 — by 6 times the number of lines you miss. To have the checksum not catch this error, you have to skip 128 lines, and that’s not going to happen. That’s about one and a quarter columns of text and the eye just doesn’t make that mistake. Skimming down a couple lines, yes. Moving to the next column, yes. Next column plus 37 lines? No.

In the new MLX, one enters eight instructions of code at a time. So skipping a line increases the checksum by 8 times the number of lines skipped. If the initial checksum were the line’s starting address modulo 256, then we’d only need to skip 16 lines to get the same initial checksum. Sixteen lines is a bit much to skip, but it’s less than one-sixth of a column. That’s not too far. And the eye could see 0968 where it means to read 0868. That’s a plausible enough error and one the new checksum would be helpless against.

So the more complicated, and outright weird, formula that MLX 2.0 uses betters this. Skipping 16 lines — entering the line for 0968 instead of 0868 — increases the base checksum by 2. Combined with the subtract-255 rule, you won’t get a duplicate of the checksum for, in most cases, 127 lines. Nobody is going to make that error.

So this explains the components. Why is the Commodore 64 version of MLX such a tangle of spaghetti code?

Here I have fewer answers. Part must be that Commodore BASIC was prone to creating messes. For example, it did not really have functions, smaller blocks of code with their own, independent, sets of variables. These would let, say, numbers convert from hexadecimal to decimal without interrupting the main flow of the program. Instead you had to jump, either by GOTO or GOSUB, to another part of the program. The Commodore or Apple II BASIC subroutine has to use the same variable names as the main part of the program, so, pick your variables wisely! Or do a bunch of reassigning values before and after the subroutine’s called.

To be precise, Commodore BASIC did let one define some functions. This by using the DEF FN command. It could take one number as the input, and return one number as output. The whole definition of the function couldn’t be more than 80 characters long. It couldn’t have a loop. Given these constraints, you can see why user-defined functions went all but unused.

The Commodore version jumps around a lot. Of its 100 lines of code, 68 jump or can jump to somewhere else. The Apple II version has 52 lines of code, 28 of which jump or can jump to another line. That’s just over 50 percent of the lines. I’m not sure how much of this reflects Apple II’s BASIC being better than Commodore’s. Commodore 64 BASIC we can charitably describe as underdeveloped. The Commodore 128 version of MLX is a bit shorter than the 64’s (90 lines of code). I haven’t analyzed it to see how much it jumps around. (But it does have some user-defined functions.)

The most mysterious element, to me, is the defining of some constants like Z2, which is 2, or Z5, which is 255. The Apple version of this doesn’t uses these constants. It uses 2 or 255 or such in the checksum calculation. I can rationalize replacing 254 with Z4, or 255 with Z5, or 127 with Z7. The Commodore 64 allowed only 80 tokens in a command line. So these values might save only a couple characters, but if they’re needed characters, good. Z2, though, only makes the line longer.

I would have guessed that this reflected experiments. That is, trying out whether one should double the existing sum and add a new number, or triple, or quadruple, or even some more complicated rule. But the Apple II version appeared first, and has the number 2 hard-coded in. This might reflect that Tim Victor, author of the Apple II version, preferred to clean up such details while Ottis R Cowper, writing the Commodore version, did not. Lacking better evidence, I have to credit that to style.

Is this checksum any good?

Whether something is “good” depends on what it is supposed to do. The New MLX, or MLX 2.0, was supposed to make it possible to type in long strings of machine-language code while avoiding errors. So it’s good if it protects against those errors without being burdensome.

It’s a light burden. The person using this types in 18 keystrokes per line. This carries eight machine-language instructions plus one checksum number. So only one-ninth of the keystrokes are overhead, things to check that other work is right. That’s not bad. And it’s better than the original version of MLX, where up to 21 keystrokes gave six instructions. And one-seventh of the keystrokes were the checksum overhead.

The checksum quite effectively guards against entering instructions on a wrong line. To get the same checksum that (say) line 0811 would have you need to jump to line 0C09. In print, that’s another column over and a third of the way down the page. It’s a hard mistake to make.

Entering a wrong number in the instructions — say, typing in 22 where one means 20 — gets caught. The difference gets multiplied by some whole power of two in the checksum. Which power depends on what number’s entered wrong. If the eighth instruction is entered wrong, the checksum is off by that error. If the seventh instruction is wrong, the checksum is off by two times that error. If the sixth instruction is wrong, the checksum is off by four times that error. And so on, so that if the first instruction is wrong, the checksum is off by 128 times that error. And these errors are taken not-quite-modulo 255.

The only way to enter a single number wrong without the checksum catching it is to type something 255 higher or lower than the correct number. And MLX confines you to entering a two-hexadecimal-digit number, that is, a number from 0 to 255. The only mistake it’s possible to make is to enter 00 where you mean FF, or FF where you mean 00.

What about transpositions? Here, the the new MLX checksum shines. Doubling the sum so far and adding a new term to it makes transpositions very likely to be caught. Not many, though. A transposition of the data at position number j and at position number k will go unnoticed only when and happen to make true

This doesn’t happen much. It needs and to be 255 apart. Or for to be a divisor of 255 and to be another divisor. I’ll discuss when that happens in the next section.

In practice, this is a great simple checksum formula. It isn’t hard to calculate, it catches most of the likely data-entry mistakes, and it doesn’t require much extra data entry to work.

What flaws did the checksum have?

The biggest flaw the MLX 2.0 checksum scheme has is that it’s helpless to distinguish FF, the number 255, from 00, the number 0. It’s so vulnerable to this that a warning got attached to the MLX listing in every issue of the magazines:

Because of the checksum formula used, MLX won’t notice if you accidentally type FF in place of 00, and vice versa. And there’s a very slim chance that you could garble a line and still end up with a combination of characters that adds up to the proper checksum. However, these mistakes should not occur if you take reasonable care while entering data.

So when can a transposition go wrong? Well, any time you swap a 00 and an FF on a line, however far apart they are. But also if you swap the elements in position j and k, if is a divisor of 255 and works with you, modulo 255.

For a transposition of adjacent instructions to go wrong — say, the third and the fourth numbers in a line — you need the third and fourth numbers to be 255 apart. That is, entering 00 FF where you mean FF 00 will go undetected. But that’s the only possible case for adjacent instructions.

A transposition past one space — say, swapping the third and the fifth numbers in a line — needs the two to be 85, 170, or 255 away. So, if you were supposed to enter (in hexadecimal) EE A9 44 and you instead entered 44 A9 EE, it would go undetected. That’s the only way a one-space transposition can happen. MLX will catch entering EE A9 45 as 45 A9 EE.

A transposition past two spaces — say, swapping the first and the fifth numbers — will always be caught unless the numbers are 255 apart, that is, a 00 and an FF. A transposition past three spaces — like, swapping the first and the sixth numbers — is vulnerable again. Then if the first and sixth numbers are off by 17 (or a multiple of 17) the swap will go unnoticed. A transposition across four spaces will always be caught unless it’s 00 for FF. A transposition across five spaces — like, swapping the second and eighth numbers — has to also have the two numbers be 85 or 170 or 255 apart to sneak through. And a transposition across six spaces — this has to be swapping the first and last elements in the line — again will be caught unless it’s 00 for FF.

Listing all the possible exceptions like this makes it sound dire. It’s not. The most likely transposition someone is going to make is swapping the order of two elements. That’s caught unless one of the numbers is FF and the other 00. If the transposition swaps non-neighboring numbers there’s a handful of new cases that might slip through. But you can estimate how often two numbers separated by one or three or five spaces are also different by 85 or 34 or another dangerous combination. (That estimate would suppose that every number from 0 to 255 is equally likely. They’re not, though, because popular machine language instruction codes such as A9 or 20 will be over-represented. So will references to important parts of computer memory such as, on the Commodore, FFD2.)

You will forgive me for not listing all the possible cases where competing typos in entering numbers will cancel out. I don’t want to figure them out either. I will go along with the magazines’ own assessment that there’s a “very slim chance” one could garble the line and get something that passes, though. After all, there are 18,446,744,073,709,551,615 conceivable lines of code one might type in, and only 255 possible checksums. Some garbled lines must match the correct checksum.

Could the checksum have been better?

The checksum could have been different. This is a trivial conclusion. “Better”? That demands thought. A good error-detection scheme needs to catch errors that are common or that are particularly dangerous. It should add as little overhead as possible.

The MLX checksum as it is catches many of the most common errors. A single entry mis-keyed, for example, except for the case of swapping 00 and FF. Or transposing one number for the one next to it. It even catches most transpositions with spaces between the transposed numbers. It catches almost all cases where one enters the entirely wrong line. And it does this for only two more keystrokes per eight pieces of data entered. That’s doing well.

The obvious gap is the inability to distinguish 00 from FF. There’s a cure for that, of course. Count the number of 00’s — or the number of FF’s — in a line, and include that as part of the checksum. It wouldn’t be particularly hard to enter (going back to the Q-Bird example)

(Or if you prefer, to have the extra checksums be 0 0 0 1.)

This adds to the overhead, yes, one more keystroke in what is already a good bit of typing. And one may ask whether you’re likely to ever touch 00 when you mean FF. The keys aren’t near one another. Then you learn that MLX soon got a patch which made keying much easier. They did this by making the characters in the rows under 7 8 9 0 type in digits. And the mapping used (on the Commodore 64) put the key to enter F right next to the key to enter 0.

If you get ambitious, you might attempt even cleverer schemes. Suppose you want to catch those off-by-85 or off-by-17 differences that would detect transpositions. Why not, say, copy the last bits of each of your eight data, and use that to assemble a new checksum number? So, for example, in line 0801 up there the last bit of each number was 1-0-0-0-0-0-0-0 which is boring, but gives us 128, hexadecimal 80, as a second checksum. Line 0809 has eighth bits 1-0-0-0-1-0-1-0-0, or 138 (hex 8A). And so on; so we could have:

Now, though? We’ve got five keystrokes of overhead to sixteen keystrokes of data. Getting a bit bloated. It could be cleaned up a little; the single-digit count of 00’s (or FF’s) is redundant to the two-digit number formed from the cross-section I did there.

And if we were working in a modern programming language we could reduce the MLX checksum and this sampled-digit checksum to a single number. Use the bitwise exclusive-or of the two numbers as the new, ‘mixed’ checksum. Exclusive-or the sampled-digit with the mixed checksum and you get back the classic MLX checksum. You get two checksums in the space of one. In the program you’d build the sampled-digit checksum, and exclusive-or it with the mixed checksum, and get back what should be the MLX checksum. Or take the mixed checksum and exclusive-or it with the MLX checksum, and you get the sampled-digit checksum.

This almost magic move has two problems. This sampled digit checksum could catch transpositions that are off by 85 or 17. It won’t catch transpositions off by 17 or by 34, though, just as deadly. It will catch transpositions off by odd multiples of 17, at least. You would catch transpositions off by 85 or by 34 if you sampled the seventh digit, at least. Or if you build a sample based on the fifth or the third digit. But then you won’t catch transpositions off by 85 or by 17. You can add new sampled checksums. This threatens us again with putting in too many check digits for actual data entry.

The other problem is worse: Commodore 64 BASIC did not have a bitwise exclusive-or command. I was shocked, and I was more shocked to learn that Applesoft BASIC also lacked an exclusive-or. The Commodore 128 had exclusive-or, at least. But given that lack, and the inability to add an exclusive-or function that wouldn’t be infuriating? I can’t blame anyone for not trying.

So there is my verdict. There are some obvious enough ways that MLX’s checksum might have been able to catch more errors. But, given the constraints of the computers it was running on? A more sensitive error check likely would not have been available. Not without demanding much more typing. And, as a another practical matter, demanding the program listings in the magazine be smaller and harder to read. The New MLX did, overall, a quite good job catching errors without requiring too much extra typing. We’ll probably never see its like again.