(Apologies to Tom T. Hall; I couldn’t resist, and it leaves room for a followup.)

Recently a gentleman asked me whether I have a CoCo 3 (more formally, a TRS-80 Color Computer 3). He was curious about whether I could run a couple of benchmarks both in Color BASIC and in BASIC09.

I do have a CoCo 3, but I need more space to excavate the relevant hardware and to set it up. I’m doing well to have space for my desktop computer, a Raspberry Pi 3 B+, and a FPGA-based CoCo 3-like system roughly as big as the RPi, the Matchbox CoCo.

I also have MAME installed on my desktop computer, and know from past experience that its CoCo emulations are *very* accurate when it comes to execution speed. (We’ll see a confirmation of that shortly.) So we fired up MAME and went looking for the benchmarks: one from *Creative Computing* and one from *BYTE*.

## David Ahl’s Small Benchmark

David Ahl, founder of the great personal computer magazine *Creative Computing*, wrote a short and simple benchmark in BASIC. (The Wikipedia article for “*Creative Computing* Benchmark” has a good set of references.)

It has a crude test of the random number generator, but the major part of it repeatedly adds the integers from 1 to 100, with a twist: each such number has its 1024th root taken via repeated square root evaluation, and the result is raised to the 1024th power, via repeated squaring with the BASIC exponentiation operator. This definitely dominates execution time, and one can get some idea of how accurate the square root and exponentiation operators are by how far off the sum is from 5050–or rather how far a fifth of the sum is from 1010. (I don’t know why it’s done that way, unless it’s to add one last potential source of error.)

For Color BASIC, I used the code straight from Ahl’s article, save for leaving out the comments. I timed it with a stop watch, and such is the talent of the folks behind MAME and its CoCo emulator that the result, including run time (2:23), exactly matched the CoCo entry in the article.

For BASIC09, I initially left out declarations, but had to initialize R and S, and I added lines to print DATE$ (whose value is a string holding the current date and time in the format yy/mm/dd hh:mm:ss) at the start and end. The source:

```
PROCEDURE Ahl
PRINT DATE$
s:=0
r:=0
PRINT " Accuracy Random"
FOR n:=1 TO 100
a:=n
FOR i:=1 TO 10
a:=SQRT(a)
r:=r+RND(0)
NEXT i
FOR i:=1 TO 10
a:=a^2
r:=r+RND(0)
NEXT i
s:=s+a
NEXT n
PRINT ABS(1010-s/5);" ";ABS(1000-r)
PRINT DATE$
```

The output:

19/05/17 18:05:56 Accuracy Random 1.06906891E-03 2.6322155 19/05/17 18:06:23

So it took somewhere between twenty-six and twenty-eight seconds. that’s between about five and five and a half times as fast as the Color BASIC version, though to be fair we should note that Color BASIC comes up running at 0.89 MHz, but on the CoCo 3 OS-9 (and NitrOS-9, which is what I use now) switch to the 1.78 MHz clock rate the CoCo 3 is capable of. The “accuracy” result is about 1.8 times that of the Color BASIC version. (On the other hand, the BASIC09 version is almost as fast as the IBM PC and does far better for accuracy.)

Exponentiation is by far the slowest operation in this benchmark, and the most error -inducing. After all, it’s typically done in three steps: take the logarithm of the base, multiply by the exponent, and raise to the power (each step increasing the error). Given that we’re raising it to a constant integer power, in particular squaring it, the only reason anyone would use exponentiation rather than simple multiplication is to be consistent with the benchmark as it was originally written. Just changing “a:=a^2” to “a:=SQ(a)” (SQ is a BASIC09 function that squares its argument) gives as result

19/05/17 18:35:07 Accuracy Random 9.45091248e-04 .203981891 19/05/17 18:35:13

decreasing the error as expected while cutting the run time to between five and seven seconds.

Let’s give Color BASIC a chance to show how it will run with a similar change. With “A=A*A” instead of “A=A^2”, it has an “accuracy” figure of 1.93357468E-04, roughly a third of the value using exponentiation, a “random” value of 7.3876276, and runs in 1:29, almost a minute faster… but note that in the versions without exponentiation, BASIC09 is now between 12.7 and 17.8 times as fast as Color BASIC. Removing the major bottleneck, the slowest arithmetic operation in BASIC, shows off BASIC09’s advantages over Color BASIC that much more–so that declaring the loop control variables to be INTEGER instead of letting them default to REAL might be worthwhile.

*BYTE*‘s Prime Sieve

In BYTE v.6 #9, Jim Gilbreath wrote about looking for a benchmark and settling on a sieve of Eratosthenes program. Unlike Ahl’s benchmark, the intent was to compare languages, and the article is fun to read for the listings of various versions, e.g. PL/M, Ratfor, Forth, and even COBOL.

There is a BASIC version, but running it in Color BASIC gives an immediate “OM” error thanks to the array of 8192 five-byte floating point values used to hold what are effectively Boolean values. It would have to be rewritten to use an 8192-character string to fit on the CoCo…but Color BASIC strings have a maximum length of 255 characters. All right, how about an array of enough floating point values to be at least 8192 bytes? It’s a thought, and Color BASIC initializes values to zero–but I’ve found no way to get the address of a variable or array in any version of Color BASIC, so I won’t bother.

BASIC09, on the other hand, has a Boolean type, and a Boolean only takes up one byte of memory, so it’s easy to write:

```
PROCEDURE sieve
BASE 0
DIM i,k,prime,count:INTEGER; flags(8191):BOOLEAN
PRINT DATE$
PRINT "Only one iteration"
FOR i:=0 TO SIZE(flags)-1
flags(i):=TRUE
NEXT i
count:=0
FOR i:=0 TO SIZE(flags)-1
IF flags(i) THEN
prime:=i+i+3
FOR k:=i+prime TO SIZE(flags)-1 STEP prime
flags(k):=FALSE
NEXT k
count:=count+1
ENDIF
NEXT i
PRINT count;" primes"
PRINT DATE$
```

This deserves a little explanation: since the BASIC original uses zero as a subscript, we use the BASIC09 “BASE 0” statement to permit zero subscripts. I’ve recently found out that in Color BASIC, given an array dimensioned with DIM x(3), you actually have a four element array, with subscripts running from 0 to 3 inclusive. If that’s documented anywhere, I have yet to find it.

And the output:

19/05/18 07:37:48 Only one iteration 1899 primes 19/05/18 07:38:03

So the run time is somewhere between fourteen and sixteen seconds. The BASIC09 “dir” command shows that the code size is 323 bytes, and the data size is 8229 bytes; support for a reasonable set of primitive types makes the difference between easily fitting in a 64K address space and an OM error.

(By the way, the *BYTE* article has tables of results for various processors and languages, with Z80 systems split out into their own table. The best Z80 time, using a Digital Research PL/I compiler, is fourteen seconds… ERRATUM: I didn’t notice that the PL/I code did ten iterations, and it’s not fair to compare ten iterations in PL/I with one in BASIC09.)

## Thanks

Thanks to Maury Markowitz for asking about benchmarks and to archive.org for the archives of computing magazines.

Fascinating. Thanks for writing this up.

It was educational. Turns out that Color BASIC sets numeric variables to zero when they’re created (and I bet strings to “”), but BASIC09 is like C: unintialized variables have random de garbage.

BTW, there are results listed for BASIC compilers for the Ahl benchmark. A compiler worth its salt will notice ^2 can be turned into a simple multiply. This is the same issue that came up with Dhrystone 1, which didn’t bother to do something with the operations performed, so that an optimizing compiler could make most of the benchmark go away. There was a Dhrystone 2 that kept it from happening. I used to have a copy of a Motorola white paper taking Intel to task, I believe about citing Dhrystone 1 results for the 80386 in comparisons with the 68020.

For completeness’s sake, I removed the timing info from sieve itself and wrote a wrapper that saved DATE$, ran sieve ten times, and then printed the saved DATE$ value and the current DATE$ value–that cuts the variation in possible time results and puts it on an even playing field with the other versions that do ten iterations. The result: ten iterations took between 114 and 115 seconds, though the range has to be two seconds (in the best case, the starting time is just before the second after what’s shown and the ending time is barely past what’s show, and the worst case is the other way around). In any case, it means one iteration is between 11.4 and 11.6 seconds, considerably better than what one iteration indicated.