## How did Compute!’s Automatic Proofreader Work?

After that work on MLX, the programs that Compute! and Compute!’s Gazette used to enter machine language programs, I figured I was done. There was the Automatic Proofreader, used to catch errors in typing in BASIC programs. But that program was written in the machine language of the 6502 line of microchip. I’ve never been much on machine language and figured I couldn’t figure out how it worked. And then on a lark I tried and saw. And it turned out to be easy.

With qualifiers, of course. Compute! and Compute!’s Gazette had two generations of Automatic Proofreader for Commodore computers. The magazines also had Automatic Proofreaders for the other eight-bit computers that they covered. I trust that those worked the same way, but — with one exception — don’t know. I haven’t deciphered most of those other proofreaders.

Let me introduce how it was used, though. Compute! and Compute!’s Gazette offered computer programs to type in. Many of them were in BASIC, which uses many familiar words of English as instructions. But you can still make typos entering commands, and this causes bugs or crashes in programs. The Automatic Proofreader, for the Commodore (and the Atari), put in a little extra step after you typed in a line of code. It calculated a checksum. It showed that on-screen after every line you entered. And you could check whether that matched the checksum the magazine printed. So the listing in the magazine would be something like:

100 POKE 56,50:CLR:DIM IN$,I,J,A,B,A$,B$,A(7),N$ :rem 34
110 C4=48:C6=16:C7=7:Z2=2:Z4=254:Z5=255:Z6=256:Z7=127 :rem 238
120 FA=PEEK(45)+Z6*PEEK(46): BS=PEEK(55)+Z6*PEEK(56): H$="0123456789ABCDEF" :rem118  You would type in all those lines up to the :rem part. ‘rem’ here stands for ‘Remark’ and means the rest of the line is a comment to the programmer, not the computer. So they’d do no harm if you did enter them. But why type text you didn’t need? So after typing, say, 100 POKE 56,50:CLR:DIM IN$,I,J,A,B,A$,B$,A(7),N$you’d hit return and with luck get the number 34 up on screen. The Automatic Proofreader did not force you to re-type the line. You were on your honor to do that. (Nor were you forced to type lines in order. If you wished to type line 100, then 200, then 300, then 190, then 250, then 330, you could. The checksum would calculate the same.) And it didn’t only work for entering programs, these commands starting with line numbers. It would return a result for any command you entered. But since you wouldn’t know what the checksum should be for a freeform command, that didn’t tell you much. The first-generation Automatic Proofreader, which is what I’m talking about here, returned a number between 0 and 255. And it was a simple checksum. It could not detect transposed characters: the checksum for PIRNT was the same as PRINT and PRITN. And, it turns out, errors could offset: the checksum for PEEK(46) would be the same as that for PEEK(55). And there was one bit of deliberate insensitivity built in. Spaces would not be counted. The checksum for FA=PEEK(45)+Z6*PEEK(46) would be the same as FA = PEEK( 45 ) + Z6 * PEEK( 46 ). So you could organize text in whatever way was most convenient. Given this, and given the example of the first MLX, you may have a suspicion how the Automatic Proofreader calculated things. So did I and it turned out to be right. The checksum for the first-generation Automatic Proofreader, at least for the Commodore 64 and the Vic-20, was a simple sum. Take the line that’s been entered. Ignore spaces. But otherwise, take the ASCII code value for each character, and add that up, modulo 256. That is, if the sum is (say) 300, subtract 256 from that, that is, 44. I’m fibbing a little when I say it’s the ASCII code values. The Commodore computers used a variation on ASCII, called PETSCII (Commodore’s first line of computers was the PET). For ordinary text the differences between ASCII and PETSCII don’t matter. The differences come into play for various characters Commodores had. These would be symbols like the suits of cards, or little circles, or checkerboard patterns. Symbols that, these days, we’d see as emojis, or at least part of an extended character set. But translating all those symbols is … tedious, but not hard. If you want to do a simulated Automatic Proofreader in Octave, it’s almost no code at all. It turns out Octave and Matlab need no special command to get the ASCII code equivalent of text. So here’s a working simulation function retval = automatic_proofreader (oneLine) trimmedLine = strrep(oneLine, " ", ""); # In Matlab this should be replace(oneLine, " ", ""); retval = mod(sum(trimmedLine), 256); endfunction To call it type in a line of text: automatic_proofreader("100 POKE 56,50:CLR:DIM IN$,I,J,A,B,A$,B$,A(7),N$")  Capitalization matters! The ASCII code for capital-P is different from that for lowercase-p. Spaces won’t matter, though. More exotic characters, though, such as the color-setting commands, are trouble and let’s not deal with that right now. Also you can enclose your line in single-quotes, in case for example you want the checksum of a line that had double-quotes. Let’s agree that lines with single- and double-quotes don’t exist. I understand the way Commodore 64’s work well enough that I can explain the Automatic Proofreader’s code. I plan to do that soon. I don’t know how the Atari version of the Automatic Proofreader worked, but since it had the same weaknesses I assume it used the same algorithm. There is a first-generation Automatic Proofreader with a difference, though, and I’ll come to that. ## Here’s some Matlab/Octave code for your MLX simulator I am embarrassed that after writing 72,650 words about MLX 2.0 for last week, I left something out. Specifically, I didn’t include code for your own simulation of the checksum routine on a more modern platform. Here’s a function that carries out the calculations of the Commodore 64/128 or Apple II versions of MLX 2.0. It’s written in Octave, the open-source Matlab-like numerical computation routine. If you can read this, though, you can translate it to whatever language you find convenient. function [retval] = mlxII (oneline) z2 = 2; z4 = 254; z5 = 255; z6 = 256; z7 = 127; address = oneline(1); entries = oneline(2:9); checksum = oneline(10); ck = 0; ck = floor(address/z6); ck = address-z4*ck + z5*(ck>z7)*(-1); ck = ck + z5*(ck>z5)*(-1); # # This looks like but is not the sum mod 255. # The 8-bit computers did not have a mod function and # used this subtraction instead. # for i=1:length(entries), ck = ck*z2 + z5*(ck>z7)*(-1) + entries(i); ck = ck + z5*(ck>z5)*(-1); endfor # # The checksum *can* be 255 (0xFF), but not 0 (0x00)! # Using the mod function could make zeroes appear # where 255's should. # retval = (ck == checksum); endfunction  This reproduces the code as it was actually coded. Here’s a version that relies on Octave or Matlab’s ability to use modulo operations: function [retval] = mlxIIslick (oneline) factors = 2.^(7:-1:0); address = oneline(1); entries = oneline(2:9); checksum = oneline(10); ck = 0; ck = mod(address - 254*floor(address/256), 255); ck = ck + sum(entries.*factors); ck = mod(ck, 255); ck = ck + 255*(ck == 0); retval = (ck == checksum); endfunction  Enjoy! Please don’t ask when I’ll have the Automatic Proofreader solved. ## How did Compute!’s and Compute!’s Gazette’s New MLX Work? A couple months ago I worked out a bit of personal curiosity. This was about how MLX worked. MLX was a program used in Compute! and Compute!’s Gazette magazine in the 1980s, so that people entering machine-language programs could avoid errors. There were a lot of fine programs, some of them quite powerful, free for the typing-in. The catch is this involved typing in a long string of numbers, and if any were wrong, the program wouldn’t work. So MLX, introduced in late 1983, was a program to make typing in programs better. You would enter in a string of six numbers — six computer instructions or data — and a seventh, checksum, number. Back in January I worked out finally what the checksum was. It turned out to be simple. Take the memory location of the first of your set of six instructions, modulo 256. Add to it each of the six instructions, modulo 256. That’s the checksum. If it doesn’t match the typed-in checksum, there’s an error. There’s weaknesses to this, though. It’s vulnerable to transposition errors: if you were supposed to type in 169 002 and put in 002 169 instead, it wouldn’t be caught. It’s also vulnerable to casual typos: 141 178 gives the same checksum as 142 177. Which is all why the original MLX lasted only two years. # What Was The New MLX? The New MLX, also called MLX 2.0, appeared first in the June 1985 Compute!. This in a version for the Apple II. Six months later a version for the Commodore 64 got published, again in Compute!, though it ran in Compute!’s Gazette too. Compute! was for all the home computers of the era; Compute!’s Gazette specialized in the Commodore computers. I would have sworn that MLX got adapted for the Atari eight-bit home computers too, but can’t find evidence it ever was. By 1986 Compute! was phasing out its type-in programs and didn’t run much for Atari anymore. The new MLX made a bunch of changes. Some were internal, about how to store a program being entered. One was dramatic in appearance. In the original MLX people typed in decimal numbers, like 32 or 169. In the new, they would enter hexadecimal digits, like 20 or A9. And a string of eight numbers on a line, rather than six. This promised to save our poor fingers. Where before we needed to type in 21 digits to enter six instructions, now we needed 18 digits to enter eight instructions. So the same program would take about two-thirds the number of keystrokes. A plausible line of code would look something like: 0801:0B 08 00 00 9E 32 30 36 EC 0809:31 00 00 00 A9 00 8D 20 3A 0811:D0 20 CF 14 20 1B 08 4C 96 0819:C7 0B A9 93 20 D2 FF A9 34  (This from the first lines for “Q-Bird”, a game published in the December 1986 Compute!’s Gazette.) And, most important, there was a new checksum. # What was the checksum formula? I had a Commodore 64, so I always knew MLX from its Commodore version. The key parts of the checksum code appear in it in lines 350 through 390. Let me copy out the key code, spaced a bit out for easier reading: 360 A = INT(AD/Z6): GOSUB 350: A = AD - A*Z6: GOSUB 350: PRINT":"; 370 CK = INT(AD/Z6): CK = AD - Z4*CK + Z5*(CK>27): GOTO 390 380 CK = CK*Z2 + Z5*(CK>Z7) + A 390 CK = CK + Z5*(CK>Z5): RETURN  Z2, Z4, Z5, Z6, and Z7 are constants, defined at the start of the program. Z4 equals 254, Z5 equals 255, Z6 equals 256, and Z7, as you’d expect, is 127. Z2, meanwhile, was a simple 2. A bit of Commodore BASIC here. INT means to take the largest whole number not larger than whatever’s inside. AD is the address of the start of the line being entered. CK is the checksum. A is one number, one machine language instruction, being put in. GOSUB, “go to subroutine”, means to jump to another line and execute commands from there, and then RETURN. That’s the command. The program then continues from the next instruction after the GOSUB. In this code, line 350 converts a number from decimal to hexadecimal and prints out the hexadecimal version. This bit about adding Z5 * (CK>Z7) looks peculiar. Commodore BASIC evaluates logical expressions like CK > 27 into a bit pattern. That pattern looks like a number. We can use it like an integer. Many programming languages do something like that and it can allow for clever but cryptic programming tricks. An expression that’s false evaluates as 0; an expression that’s true evaluates as -1. So, CK + Z5*(CK>Z5) is an efficient little filter. If CK is smaller than Z5, it’s left untouched. If CK is larger than Z5, then subtract Z5 from CK. This keeps CK from being more than 255, exactly as we’d wanted. But you also notice: this code makes no sense. Like, starting the checksum with something derived from the address makes sense. Adding to that numbers based on the instructions makes sense. But the last instruction of line 370 is a jump straight to line 390. Line 380, where any of the actual instructions are put into the checksum, never gets called. Also, there’s eight instructions per line. Why is only one ever called? And this was a bear to work out. One friend insisted I consider the possibility that MLX was buggy and nobody had found the defect. I could not accept that, not for a program that was so central to so much programming for so long. Also, not considering that it worked. Make almost any entry error and the checksum would not match. # Where’s the rest of the checksum formula? This is what took time! I had to go through the code and find what other lines call lines 360 through 390. There’s a hundred lines of code in the Commodore version of MLX, which isn’t that much. They jump around a lot, though. By my tally 68 of these 100 lines jump to, or can jump to, something besides the next line of code. I don’t know how that compares to modern programming languages, but it’s still dizzying. For a while I thought it might be a net saving in time to write something that would draw a directed graph of the program’s execution flow. It might still be worth doing that. The checksum formula gets called by two pieces of code. One of them is the code when the program gets entered. MLX calculates a checksum and verifies whether it matches the ninth number entered. The other role is in printing out already-entered data. There, the checksum doesn’t have a role, apart from making the on-screen report look like the magazine listing. Here’s the code that calls the checksum when you’re entering code: 440 POKE 198,0: GOSUB 360: IF F THEN PRINT IN$ PRINT" ";
[ many lines about entering your data here ]
560 FOR I=1 TO 25 STEP 3:
B$= MID$(IN$, I): GOSUB 320: IF I<25 THEN GOSUB 380: A(I/3)=A 570 NEXT: IF ACK THEN GOSUB 1060: PRINT "ERROR: REENTER LINE ": F = 1: GOTO 440 580 GOSUB 1080: [ several more lines setting up a new line of data to enter ]  Line 320 started the routine that turned a hexadecimal number, such as 7F, into decimal, such as 127. It returns this number as the variable named A. IN$ was the input text, part of the program you you enter. This should be 27 characters long. A(I/3) was an element in an array, the string of eight instructions for that entry. Yes, you could use the same name for an array and for a single, unrelated, number. Yes, this was confusing.

But here’s the logic. Line 440 starts work on your entry. It calculates the part of the checksum that comes from the location in memory that data’s entered in. Line 560 does several bits of work. It takes the entered instructions and converts the strings into numbers. Then it takes each of those instruction numbers and adds its contribution to the checksum. Line 570 compares whether the entered checksum matches the computed checksum. If it does match, good. If it doesn’t match, then go back and re-do the entry.

The code for displaying a line of your machine language program is shorter:

630 GOSUB 360:
B = BS + AD - SA;
FOR I = B TO B+7:
A = PEEK(I):
GOSUB 350:
GOSUB 380:
PRINT S$; 640 NEXT: PRINT ""; A = CK: GOSUB 350: PRINT  The bit about PEEK is looking into the buffer, which holds the entered instructions, and reading what’s there. The GOSUB 350 takes the number ‘A’ and prints out its hexadecimal representation. GOSUB 360 calculates the part of the checksum that’s based on the memory location. The GOSUB 380 contributes the part based on every instruction. S$ is a space. It’s used to keep all the numbers from running up against each other.

# So what is the checksum formula?

The checksum takes in two parts. The first part is based on the address at the start of the line. Let me call that the number $AD$. The second part is based on the entry, the eight instructions following the line. Let me call them $D_1$ through $D_8$. So this is easiest described in two parts.

The base of the checksum, which I’ll call $ck_{0}$, is:

$ck_{0} = AD - 254 \cdot \left(floor(AD \div 256)\right) \\ \mbox { [ subtract 255 if this is 256 or greater ] }$

For example, suppose the address is 49152 (in hexadecimal, C000), which was popular for Commodore 64 programming. Then $ck_{0}$ would be 129. If the address is 2049 (in hexadecimal, 0801), another popular location, $ck_{0} would be 17. Generally, the initial$latex ck_{0} $increases by 1 as the memory address for the start of a line increases. If you entered a line that started at memory address 49153 (hexadecimal C001) for some reason, that $ck_{0}$ would be 130. A line which started at address 49154 (hexadecimal C002) would have $ck_{0}$ start at 131. This progression continues until $ck_{0}$ would reach 256. Then that greater-than filter at the end of the expression intrudes. A line starting at memory address 49278 (C07E) has $ck_{0}$ of 255, and one starting at memory address 49279 (C07F) has $ck_{0}$ of 1. I see reason behind this choice. That’s the starting point. Now to use the actual data, the eight pieces $D_1$ through $D_8$ that are the actual instructions. The easiest way for me to describe this is do it as a loop, using $ck_{0}$ to calculate $ck_{1}$, and $ck_{1}$ to define $ck_{2}$ and so on. $ck_{j} = 2 \cdot ck_{j - 1} \cdots \\ \mbox { [ subtract 255 if this is 256 or greater ] } \\ \cdots + d_{j} \\ \mbox { [ subtract 255 if this is 256 or greater ] } \mbox{for j = 1 ... 8}$ That is, for each piece of data in turn, double the existing checksum and add the next data to it. If this sum is 256 or larger, subtract 255 from it. The working sum never gets larger than 512, thanks to that subtract-255-rule after the doubling. And then again that subtract-255-rule after adding $d_j$. Repeat through the eighth piece of data. That last calculated checksum, $ck_{8}$, is the checksum for the entry. If $ck_{8}$ does match the entered checksum, go on to the next entry. If $ck_{8}$ does not match the entered checksum, give a warning and go back and re-do the entry. # Why was MLX written like that? There are mysterious bits to this checksum formula. First is where it came from. It’s not, as far as I can tell, a standard error-checking routine, or if it is it’s presented in a form I don’t recognize. But I know only small pieces of information theory, and it might be that this is equivalent to a trick everybody knows. The formula is, at heart, “double your working sum and add the next instruction, and repeat”. At the end, take the sum modulo 255 so that the checksum is no more than two hexadecimal digits. Almost. In studying the program I spent a lot of time on a nearly-functionally-equivalent code that used modulo operations. I’m confident that if Apple II and Commodore BASIC had modulo functions, then MLX would have used them. But those eight-bit BASICs did not. Instead the programs tested whether the working checksum had gotten larger than 255, and if it had, then subtracted 255 from it. This is a little bit different. It is possible for a checksum to be 255 (hexadecimal FF). This even happened. In the June 1985 Compute!, introducing the new MLX for the Apple II, we have this entry as part of the word processor Speedscript 3.0 that anyone could type in: 0848: 20 A9 00 8D 53 1E A0 00 FF  What we cannot have is a checksum of 0. (Unless a program began at memory location 0, and had instructions of nothing but 0. This would not happen. The Commodore 64, and the Apple II, used those low-address memory locations for system work. No program could use them.) Were the formulas written with modulo operations, we’d see 00 where we should see FF. Doubling the working sum and then setting it to be in a valid range — from 1 to 255 — is easy enough. I don’t know how the designer settled on doubling, but have hypotheses. It’s a good scheme for catching transposition errors, entering 20 FF D2 where one means to enter 20 D2 FF. The initial $ck_{0}$ seems strange. The equivalent step for the original MLX was the address on which the entry started, modulo 256. Why the change? My hypothesis is this change was to make it harder to start typing in the wrong entry. The code someone typed in would be long columns of numbers, for many pages. The text wasn’t backed by alternating bands of color, or periodic breaks, or anything else that made it harder for the eye to skip one or more lines of machine language code. In the original MLX, skipping one line, or even a couple lines, can’t go undetected. The original MLX entered six pieces of data at a time. If your eye skips a line, the wrong data will mismatch the checksum by 6, or by 12, or by 18 — by 6 times the number of lines you miss. To have the checksum not catch this error, you have to skip 128 lines, and that’s not going to happen. That’s about one and a quarter columns of text and the eye just doesn’t make that mistake. Skimming down a couple lines, yes. Moving to the next column, yes. Next column plus 37 lines? No. In the new MLX, one enters eight instructions of code at a time. So skipping a line increases the checksum by 8 times the number of lines skipped. If the initial checksum were the line’s starting address modulo 256, then we’d only need to skip 16 lines to get the same initial checksum. Sixteen lines is a bit much to skip, but it’s less than one-sixth of a column. That’s not too far. And the eye could see 0968 where it means to read 0868. That’s a plausible enough error and one the new checksum would be helpless against. So the more complicated, and outright weird, formula that MLX 2.0 uses betters this. Skipping 16 lines — entering the line for 0968 instead of 0868 — increases the base checksum by 2. Combined with the subtract-255 rule, you won’t get a duplicate of the checksum for, in most cases, 127 lines. Nobody is going to make that error. So this explains the components. Why is the Commodore 64 version of MLX such a tangle of spaghetti code? Here I have fewer answers. Part must be that Commodore BASIC was prone to creating messes. For example, it did not really have functions, smaller blocks of code with their own, independent, sets of variables. These would let, say, numbers convert from hexadecimal to decimal without interrupting the main flow of the program. Instead you had to jump, either by GOTO or GOSUB, to another part of the program. The Commodore or Apple II BASIC subroutine has to use the same variable names as the main part of the program, so, pick your variables wisely! Or do a bunch of reassigning values before and after the subroutine’s called. To be precise, Commodore BASIC did let one define some functions. This by using the DEF FN command. It could take one number as the input, and return one number as output. The whole definition of the function couldn’t be more than 80 characters long. It couldn’t have a loop. Given these constraints, you can see why user-defined functions went all but unused. The Commodore version jumps around a lot. Of its 100 lines of code, 68 jump or can jump to somewhere else. The Apple II version has 52 lines of code, 28 of which jump or can jump to another line. That’s just over 50 percent of the lines. I’m not sure how much of this reflects Apple II’s BASIC being better than Commodore’s. Commodore 64 BASIC we can charitably describe as underdeveloped. The Commodore 128 version of MLX is a bit shorter than the 64’s (90 lines of code). I haven’t analyzed it to see how much it jumps around. (But it does have some user-defined functions.) The most mysterious element, to me, is the defining of some constants like Z2, which is 2, or Z5, which is 255. The Apple version of this doesn’t uses these constants. It uses 2 or 255 or such in the checksum calculation. I can rationalize replacing 254 with Z4, or 255 with Z5, or 127 with Z7. The Commodore 64 allowed only 80 tokens in a command line. So these values might save only a couple characters, but if they’re needed characters, good. Z2, though, only makes the line longer. I would have guessed that this reflected experiments. That is, trying out whether one should double the existing sum and add a new number, or triple, or quadruple, or even some more complicated rule. But the Apple II version appeared first, and has the number 2 hard-coded in. This might reflect that Tim Victor, author of the Apple II version, preferred to clean up such details while Ottis R Cowper, writing the Commodore version, did not. Lacking better evidence, I have to credit that to style. # Is this checksum any good? Whether something is “good” depends on what it is supposed to do. The New MLX, or MLX 2.0, was supposed to make it possible to type in long strings of machine-language code while avoiding errors. So it’s good if it protects against those errors without being burdensome. It’s a light burden. The person using this types in 18 keystrokes per line. This carries eight machine-language instructions plus one checksum number. So only one-ninth of the keystrokes are overhead, things to check that other work is right. That’s not bad. And it’s better than the original version of MLX, where up to 21 keystrokes gave six instructions. And one-seventh of the keystrokes were the checksum overhead. The checksum quite effectively guards against entering instructions on a wrong line. To get the same checksum that (say) line 0811 would have you need to jump to line 0C09. In print, that’s another column over and a third of the way down the page. It’s a hard mistake to make. Entering a wrong number in the instructions — say, typing in 22 where one means 20 — gets caught. The difference gets multiplied by some whole power of two in the checksum. Which power depends on what number’s entered wrong. If the eighth instruction is entered wrong, the checksum is off by that error. If the seventh instruction is wrong, the checksum is off by two times that error. If the sixth instruction is wrong, the checksum is off by four times that error. And so on, so that if the first instruction is wrong, the checksum is off by 128 times that error. And these errors are taken not-quite-modulo 255. The only way to enter a single number wrong without the checksum catching it is to type something 255 higher or lower than the correct number. And MLX confines you to entering a two-hexadecimal-digit number, that is, a number from 0 to 255. The only mistake it’s possible to make is to enter 00 where you mean FF, or FF where you mean 00. What about transpositions? Here, the the new MLX checksum shines. Doubling the sum so far and adding a new term to it makes transpositions very likely to be caught. Not many, though. A transposition of the data at position number j and at position number k will go unnoticed only when $d_j$ and $d_k$ happen to make true $\left(2^j - 2^k\right)\cdot\left(d_j - d_k\right) = 0 \mbox{ mod } 255$ This doesn’t happen much. It needs $d_j$ and $d_k$ to be 255 apart. Or for $\left(2^j - 2^k\right)$ to be a divisor of 255 and $d_j - d_k$ to be another divisor. I’ll discuss when that happens in the next section. In practice, this is a great simple checksum formula. It isn’t hard to calculate, it catches most of the likely data-entry mistakes, and it doesn’t require much extra data entry to work. # What flaws did the checksum have? The biggest flaw the MLX 2.0 checksum scheme has is that it’s helpless to distinguish FF, the number 255, from 00, the number 0. It’s so vulnerable to this that a warning got attached to the MLX listing in every issue of the magazines: Because of the checksum formula used, MLX won’t notice if you accidentally type FF in place of 00, and vice versa. And there’s a very slim chance that you could garble a line and still end up with a combination of characters that adds up to the proper checksum. However, these mistakes should not occur if you take reasonable care while entering data. So when can a transposition go wrong? Well, any time you swap a 00 and an FF on a line, however far apart they are. But also if you swap the elements in position j and k, if $2^j - 2^k$ is a divisor of 255 and $d_j - d_k$ works with you, modulo 255. For a transposition of adjacent instructions to go wrong — say, the third and the fourth numbers in a line — you need the third and fourth numbers to be 255 apart. That is, entering 00 FF where you mean FF 00 will go undetected. But that’s the only possible case for adjacent instructions. A transposition past one space — say, swapping the third and the fifth numbers in a line — needs the two to be 85, 170, or 255 away. So, if you were supposed to enter (in hexadecimal) EE A9 44 and you instead entered 44 A9 EE, it would go undetected. That’s the only way a one-space transposition can happen. MLX will catch entering EE A9 45 as 45 A9 EE. A transposition past two spaces — say, swapping the first and the fifth numbers — will always be caught unless the numbers are 255 apart, that is, a 00 and an FF. A transposition past three spaces — like, swapping the first and the sixth numbers — is vulnerable again. Then if the first and sixth numbers are off by 17 (or a multiple of 17) the swap will go unnoticed. A transposition across four spaces will always be caught unless it’s 00 for FF. A transposition across five spaces — like, swapping the second and eighth numbers — has to also have the two numbers be 85 or 170 or 255 apart to sneak through. And a transposition across six spaces — this has to be swapping the first and last elements in the line — again will be caught unless it’s 00 for FF. Listing all the possible exceptions like this makes it sound dire. It’s not. The most likely transposition someone is going to make is swapping the order of two elements. That’s caught unless one of the numbers is FF and the other 00. If the transposition swaps non-neighboring numbers there’s a handful of new cases that might slip through. But you can estimate how often two numbers separated by one or three or five spaces are also different by 85 or 34 or another dangerous combination. (That estimate would suppose that every number from 0 to 255 is equally likely. They’re not, though, because popular machine language instruction codes such as A9 or 20 will be over-represented. So will references to important parts of computer memory such as, on the Commodore, FFD2.) You will forgive me for not listing all the possible cases where competing typos in entering numbers will cancel out. I don’t want to figure them out either. I will go along with the magazines’ own assessment that there’s a “very slim chance” one could garble the line and get something that passes, though. After all, there are 18,446,744,073,709,551,615 conceivable lines of code one might type in, and only 255 possible checksums. Some garbled lines must match the correct checksum. # Could the checksum have been better? The checksum could have been different. This is a trivial conclusion. “Better”? That demands thought. A good error-detection scheme needs to catch errors that are common or that are particularly dangerous. It should add as little overhead as possible. The MLX checksum as it is catches many of the most common errors. A single entry mis-keyed, for example, except for the case of swapping 00 and FF. Or transposing one number for the one next to it. It even catches most transpositions with spaces between the transposed numbers. It catches almost all cases where one enters the entirely wrong line. And it does this for only two more keystrokes per eight pieces of data entered. That’s doing well. The obvious gap is the inability to distinguish 00 from FF. There’s a cure for that, of course. Count the number of 00’s — or the number of FF’s — in a line, and include that as part of the checksum. It wouldn’t be particularly hard to enter (going back to the Q-Bird example) 0801:0B 08 00 00 9E 32 30 36 EC 2 0809:31 00 00 00 A9 00 8D 20 3A 4 0811:D0 20 CF 14 20 1B 08 4C 96 0 0819:C7 0B A9 93 20 D2 FF A9 34 0  (Or if you prefer, to have the extra checksums be 0 0 0 1.) This adds to the overhead, yes, one more keystroke in what is already a good bit of typing. And one may ask whether you’re likely to ever touch 00 when you mean FF. The keys aren’t near one another. Then you learn that MLX soon got a patch which made keying much easier. They did this by making the characters in the rows under 7 8 9 0 type in digits. And the mapping used (on the Commodore 64) put the key to enter F right next to the key to enter 0. If you get ambitious, you might attempt even cleverer schemes. Suppose you want to catch those off-by-85 or off-by-17 differences that would detect transpositions. Why not, say, copy the last bits of each of your eight data, and use that to assemble a new checksum number? So, for example, in line 0801 up there the last bit of each number was 1-0-0-0-0-0-0-0 which is boring, but gives us 128, hexadecimal 80, as a second checksum. Line 0809 has eighth bits 1-0-0-0-1-0-1-0-0, or 138 (hex 8A). And so on; so we could have: 0801:0B 08 00 00 9E 32 30 36 EC 2 80 0809:31 00 00 00 A9 00 8D 20 3A 4 8A 0811:D0 20 CF 14 20 1B 08 4C 96 0 24 0819:C7 0B A9 93 20 D2 FF A9 34 0 B3  Now, though? We’ve got five keystrokes of overhead to sixteen keystrokes of data. Getting a bit bloated. It could be cleaned up a little; the single-digit count of 00’s (or FF’s) is redundant to the two-digit number formed from the cross-section I did there. And if we were working in a modern programming language we could reduce the MLX checksum and this sampled-digit checksum to a single number. Use the bitwise exclusive-or of the two numbers as the new, ‘mixed’ checksum. Exclusive-or the sampled-digit with the mixed checksum and you get back the classic MLX checksum. You get two checksums in the space of one. In the program you’d build the sampled-digit checksum, and exclusive-or it with the mixed checksum, and get back what should be the MLX checksum. Or take the mixed checksum and exclusive-or it with the MLX checksum, and you get the sampled-digit checksum. This almost magic move has two problems. This sampled digit checksum could catch transpositions that are off by 85 or 17. It won’t catch transpositions off by 17 or by 34, though, just as deadly. It will catch transpositions off by odd multiples of 17, at least. You would catch transpositions off by 85 or by 34 if you sampled the seventh digit, at least. Or if you build a sample based on the fifth or the third digit. But then you won’t catch transpositions off by 85 or by 17. You can add new sampled checksums. This threatens us again with putting in too many check digits for actual data entry. The other problem is worse: Commodore 64 BASIC did not have a bitwise exclusive-or command. I was shocked, and I was more shocked to learn that Applesoft BASIC also lacked an exclusive-or. The Commodore 128 had exclusive-or, at least. But given that lack, and the inability to add an exclusive-or function that wouldn’t be infuriating? I can’t blame anyone for not trying. So there is my verdict. There are some obvious enough ways that MLX’s checksum might have been able to catch more errors. But, given the constraints of the computers it was running on? A more sensitive error check likely would not have been available. Not without demanding much more typing. And, as a another practical matter, demanding the program listings in the magazine be smaller and harder to read. The New MLX did, overall, a quite good job catching errors without requiring too much extra typing. We’ll probably never see its like again. ## How Did Compute!’s Gazette’s MLX Program Work? This is, at least, a retrocomputing-adjacent piece. I’m looking back at the logic of a common and useful tool from the early-to-mid-80s and why it’s built that way. I hope you enjoy. It has to deal with some of the fussier points about how Commodore 64 computers worked. If you find a paragraph is too much technical fussing for you, I ask you to not give up, just zip on to the next paragraph. It’s interesting to know why something was written that way, but it’s all right to accept that it was and move to the next point. ## How Did You Get Computer Programs In The 80s? When the world and I were young, in the 1980s, we still had computers. There were two ways to get software, though. One was trading cassette tapes or floppy disks with cracked programs on them. (The cracking was taking off the copy-protection.) The other was typing. You could type in your own programs, certainly, just like you can make your own web page just by typing. Or you could type in a program. We had many magazines and books that had programs ready for entry. Some were serious programs, spreadsheets and word processors and such. Some were fun, like games or fractal-generators or such. Some were in-between, programs to draw or compose music or the such. Some added graphics or sound commands that the built-in BASIC programming language lacked. All this was available for the$2.95 cover price, or ten cents a page at the library photocopier. I had a Commodore 64 for most of this era, moving to a Commodore 128 (which also ran Commodore 64 programs) in 1989 or so. So my impressions, and this article, default to the Commodore 64 experience.

These programs all had the same weakness. You had to type them in. You can expect to make errors. If the program was written in BASIC you had a hope of spotting errors. The BASIC programming language uses common English words for its commands. Their grammar is not English, but it’s also very formulaic, and not hard to pick up. One has a chance of spotting mistakes if it’s 250 PIRNT "SUM; " S one typed.

But many programs were distributed as machine language. That is, the actual specific numbers that correspond to microchip instructions. For the Commodore 64, and most of the eight-bit home computers of the era, this was the 6502 microchip. (The 64 used a variation, the 6510. The differences between the 6502 and 6510 don’t matter for this essay.) Machine language had advantages, making the programs run faster, and usually able to do more things than BASIC could. But a string of numbers is only barely human-readable. Oh, you might in time learn to recognize the valid microchip instructions. But it is much harder to spot the mistakes on entering 32 255 120. That last would be a valid command on any eight-bit Commodore computer. It would have the computer print something, if it weren’t for the transposition errors.

## What Was MLX and How Did You Use It?

The magazines came up with tools to handle this. In the 398-page(!) December 1983 issue of Compute!, my favorite line of magazines introduced MLX. This was a program, written in BASIC, which let you enter machine language programs. Charles Brannon has the credit for writing the article which introduced it. I assume he also wrote the program, but could be mistaken. I’m open to better information. Other magazines had other programs to do the same work; I knew them less well. MLX formatted machine language programs to look like this:

49152 :169,002,141,178,002,169,149
49158 :000,141,179,002,141,180,137
49164 :002,141,181,002,169,001,252
49170 :141,183,002,169,003,141,145


This was the first few lines of code for a game called Turnabout, by Mark Tuttle and Kevin Mykytyn. It ran in issue #28 of Commodore-computers-spinoff magazine Compute!’s Gazette. You can see the original at archive.org and imagine the poor wrists of people typing all this in.

What did all this mean, though? These were lines you would enter in while running MLX. Before the colon was a location in memory. The numbers after the colon — the entries, I’ll call them — are six machine language instructions, one number to go into each memory cell. So, the number 169 was destined to go into memory location 49152. The number 002 would go into memory location 49153. The number 141 would go into memory location 49154. And so on; 000 would go into memory location 49158, 141 into 49159, 179 into 49160. 002 would go into memory location 49164; 141 would go into memory location 49170. And so on.

MLX would prompt you with the line number, the 49152 or 49158 or 49164 or so on. Machine language programs could go into almost any memory location. You had to tell it where to start. 49152 was a popular location for Commodore 64 programs. It was the start of a nice block of memory not easily accessed except by machine language programs. Then you would type in the entries, the numbers that follow. This was a reasonably efficient way to key this stuff in. MLX automatically advanced the location in memory and would handle things like saving the program to tape or disk when you were done.

The alert reader notices, though, that there are seven entries after the colon in each line. That seventh number is the checksum. It’s the guard that Compute! and Compute!’s Gazette put against typos. This seventh number was a checksum. MLX did a calculation based on the memory location and the first six numbers of the line. If it was not the seventh number on the line, then there was an error somewhere. You had to re-enter the line to get it right.

The thing I’d wondered, and finally got curious enough to explore, was how it calculated this.

## What Was The Checksum And How Did It Work?

Happily, Compute! and Compute!’s Gazette published MLX in almost every issue, so it’s easy to find. You can see it, for example, on page 123 of the October 1985 issue of Compute!’s Gazette. And MLX was itself a BASIC program. There are quirks of the language, and its representation in magazine print, that take time to get used to. But one can parse it without needing much expertise. One important thing is that most Commodore BASIC commands didn’t need spaces after them. For an often-used program like this they’d skip the spaces. And the : symbol denoted the end of one command and start of another. So, for example, PRINTCHR$(20):IFN=CKSUMTHEN530 one learns means PRINT CHR$(20); IF N = CKSUM THEN 530.

So how does it work? MLX is, as a program, convoluted. It’s well-described by the old term “spaghetti code”. But the actual calculation of the checksum is done in a single line of the program, albeit one with several instructions. I’ll print it, but with some spaces added in to make it easier to read.

500 CKSUM = AD - INT(AD/256)*256:
FOR I = 1 TO 6:
CKSUM = (CKSUM + A(I))AND 255:
NEXT


Most of this you have a chance of understanding even if you don’t program. CKSUM is the checksum number. AD is the memory address for the start of the line. A is an array of six numbers, the six numbers of that line of machine language. I is an index, a number that ranges from 1 to 6 here. Each A(I) happens to be a number between 0 and 255 inclusive, because that’s the range of integers you can represent with eight bits.

## What Did This Code Mean?

So to decipher all this. Starting off. CKSUM = AD - INT(AD/256)*256. INT means “calculate the largest integer not greater than whatever’s inside”. So, like, INT(50/256) would be 0; INT(300/256) would be 1; INT(600/256) would be 2. What we start with, then, is the checksum is “the remainder after dividing the line’s starting address by 256”. We’re familiar with this, mathematically, as “address modulo 256”.

In any modern programming language, we’d write this as CKSUM = MOD(AD, 256) or CKSUM = AD % 256. But Commodore 64 BASIC didn’t have a modulo command. This structure was the familiar and comfortable enough workaround. But, read on.

The next bit was a for/next loop. This would do the steps inside for every integer value of I, starting at 1 and increasing to 6. CKSUM + A(I) has an obvious enough intention. What is the AND 255 part doing, though?

AND, here, is a logic operator. For the Commodore 64, it works on numbers represented as two-byte integers. These have a memory representation of 11111111 11111111 for ‘true’, and 00000000 00000000 for ‘false’. The very leftmost bit, for integers, is a plus-or-minus-sign. If that leftmost bit is a 1, the number is negative; if that leftmost bit is a 0, the number is positive. Did you notice me palming that card, there? We’ll come back to that.

Ordinary whole numbers can be represented in binary too. Like, the number 26 has a binary representation of 00000000 00011010. The number, say, 14 has a binary representation of 00000000 00001110. 26 AND 14 is the number 00000000 00001010, the binary digit being a 1 only when both the first and second numbers have a 1 in that column. This bitwise and operation is also sometimes referred to as masking, as in masking tape. The zeroes in the binary digits of one number mask out the binary digits of the other. (Which does the masking is a matter of taste; 26 AND 14 is the same number as 14 AND 26.)

The binary 00000000 0001010 is the decimal number 10. So you can see that generally these bitwise and operations give you weird results. Taking the bitwise and for 255 is more predictable, though. The number 255 has a bit representation of 00000000 11111111. So what (CKSUM + A(I)) AND 255 does is … give the remainder after dividing (CKSUM + A(I)) by 256. That is, it’s (CKSUM + A(I)) modulo 256.

The formula’s not complicated. To write it in mathematical terms, the calculation is:

$ck = \left(addr + \sum_{i = 1}^6 a_i\right) mod 256$

## Why Write It Like That?

So we have a question. Why are we calculating a number modulo 256 by two different processes? And in the same line of the program?

We get an answer by looking at the binary representation of 49152, which is 11000000 00000000. Remember that card I just palmed? I had warned that if the leftmost digit there were a 1, the number was understood to be negative. 49152 is many things, none of them negative.

So now we know the reason behind the odd programming choice to do the same thing two different ways. As with many odd programming choices it amounts to technical details of how Commodore hardware worked. The Commodore 64’s logical operators — AND, OR, and NOT — work on variables stored as two-byte integers. Two-byte integers can represent numbers from -32,768 up to +32,767. But memory addresses on the Commodore 64 are indexed from 0 up to 65,535. We can’t use bit masking to do the modulo operation, not on memory locations.

I have a second question, though. Look at the work inside the FOR loop. It takes the current value of the checksum, adds one of the entries to it, and takes the bitwise AND of that with 255. Why? The value would be the same if we waited until the loop was done to take the bitwise AND. At least, it would be unless the checksum grew to larger than 32,767. The checksum will be the sum of at most seven numbers, none of them larger than 255, though, so that can’t be the contraint. It’s usually faster to do as little inside a loop as possible, so, why this extravagance?

My first observation is that this FOR loop does the commands inside it six times. And logical operations like AND are very fast. The speed difference could not possibly be perceived. There is a point where optimizing your code is just making life harder for yourself.

My second observation goes back to the quirks of the Commodore 64. You entered commands, like the lines of a BASIC program, on a “logical line” that allowed up to eighty tokens. For typing in commands this is the same as the number of characters. Can this line be rewritten so there’s no redundant code inside the for loop, and so it’s all under 80 characters long?

Yes. This line would have the same effect and it’s only 78 characters:

500 CKSUM=AD-INT(AD/256)*256:FORI=1TO6:CKSUM=CKSUM+A(I):NEXT:CKSUM=CKSUMAND255


Why not use that, then?

I don’t have a clear answer. I suspect it’s for the benefit of people typing in the MLX program. In typing that in I’d have trouble not putting in a space between FOR and I, or between CKSUM and AND. Also before and after the TO and before and after AND. This would make the line run over 80 characters and make it crash. The original line is 68 characters, short enough that anyone could add a space here and there and not mess up anything. In looking through MLX, and other programs, I find there are relatively few lines more than 70 characters long. I have found them as long as 76 characters, though. I can’t rule out there being 78- or 79-character lines. They would have to suppose anyone typing them in understands when the line is too long.

There’s an interesting bit of support for this. Compute! also published machine language programs for the Atari 400 and 800. A version of MLX came out for the Atari at the same time the Commodore 64’s came out. Atari BASIC allowed for 120 characters total. And the equivalent line in Atari MLX was:

500 CKSUM=ADDR-INT(ADDR/256)*256:FOR I=1 TO 6:CKSUM=CKSUM+A(I):CKSUM=CKSUM-256*(CKSUM>255):NEXT I


This has a longer name for the address variable. It uses a different way to ensure that CKSUM stays a number between 0 and 255. But the whole line is only 98 characters.

We could save more spaces on the Commodore 64 version, though. Commodore BASIC “really” used only the first two characters of a variable name. To write CKSUM is for the convenience of the programmer. To the computer it would be the same if we wrote CK. We could even truncate it to CK for this one line of code. The only penalty would be confusing the reader who doesn’t remember that CK and CKSUM are the same variable.

And there’s no reason that this couldn’t have been two lines. One line could add up the checksum and a second could do the bitwise AND. Maybe this is all a matter of the programmer’s tastes.

In a modern language this is all quite zippy to code. To write it in Octave or Matlab is something like:

function [checksOut] = oldmlx(oneline)
address = oneline(1);
entries = oneline(2:7);
keyedChecksum = oneline(8);

checksum = mod(address, 256);
checksum += sum(entries);
checksum = mod(checksum, 256);

checksOut = (keyedChecksum == checksum);
endfunction


This is a bit verbose. I want it to be easier to see what work is being done. We could make it this compact:

function [checksOut] = oldmlx(oneline)
checksOut = !(mod(sum(oneline(1:7))-oneline(8), 256));
endfunction


I don’t like compressing my thinking quite that much, though.

But that’s the checksum. Now the question: did it work?

## Was This Checksum Any Good?

Since Compute! and Compute!’s Gazette used it for years, the presumptive answer is that it did. The real question, then, is did it work well? “Well” means does it prevent the kinds of mistakes you’re likely to make without demanding too much extra work. We could, for example, eliminate nearly all errors by demanding every line be entered three times and accept only a number that’s entered the same at least two of three times. That’s an incredible typing load. Here? We have to enter one extra number for every six. Much lower load, but it allows more errors through. But the calculation is — effectively — simply “add together all the numbers we typed in, and see if that adds to the expected total”. If it stops the most likely errors, though, then it’s good. So let’s consider them.

The first and simplest error? Entering the wrong line. MLX advanced the memory location on its own. So if you intend to write the line for memory location 50268, and your eye slips and you start entering that for 50274 instead? Or even, reading left to right, going to line 50814 in the next column? Very easy to do. This checksum will detect that nicely, though. Entering one line too soon, or too late, will give a checksum that’s off by 6. If your eye skips two lines, the checksum will be off by 12. The only way to not have the checksum miss is to enter a line that’s some multiple of 256 memory locations away. And since each line is six memory locations, that means you have to jump 768 memory locations away. That is 128 lines away. You are not going to make that mistake. (Going from one column in the magazine to the next is a jump of 91 lines. The pages were 8½-by-11 pages, so were a bit easier to read than the image makes them look.)

How about other errors? You could mis-key, say, 169. But think of the plausible errors. Typing it in as 159 or 196 or 269 would be detected by the checksum. The only one that wouldn’t would be to enter a number that’s equal to 169, modulo 256. So, 425, say, or 681. There is nobody so careless as to read 169 and accidentally type 425, though. In any case, other code in MLX rejects any data that’s not between 0 and 255, so that’s caught before the checksum comes into play.

So it’s safe against the most obvious mistake. And against mis-keying a single entry. Yes, it’s possible that you typed in the whole line right but mis-keyed the checksum. If you did that you felt dumb but re-entered the line. If you even noticed and didn’t just accept the error report and start re-entering the line.

What about mis-keying double entries? And here we have trouble. Suppose that you’re supposed to enter 169, 062 and instead enter 159, 072. They’ll add to the same quantity, and the same checksum. All that’s protecting you is that it takes a bit of luck to make two errors that exactly balance each other. But, then, slipping and hitting an adjacent number on the keyboard is an easy mistake to make.

Worse is entry transposition. If you enter 062, 169 instead you have made no checksum errors. And you won’t even be typing any number “wrong”. At least with the mis-keying you might notice that 169 is a common number and 159 a rare one in machine language. (169 was the command “Load Accumulator”. That is, copy a number into the Central Processing Unit’s accumulator. This was one of three on-chip memory slots. 159 was no meaningful command. It would only appear as data.) Swapping two numbers is another easy error to make.

And they would happen. I can attest from experience. I’d had at least one program which, after typing, had one of these glitches. After all the time spent entering it, I ended up with a program that didn’t work. And I never had the heart to go back and track down the glitch or, more efficiently, retype the whole thing from scratch.

The irony is that the program with the critical typing errors was a machine language compiler. It’s something that would have let me write this sort of machine language code. Since I never reentered it, I never created anything but the most trivial of machine language programs for the 64.

So this MLX checksum was fair. It devoted one-seventh of the typing to error detection. It could catch line-swap errors, single-entry mis-keyings, and transpositions within one entry. It couldn’t catch transposing two entries. So that could have been better. I hope to address that soon.