BASIC09 I-code: what it is, and why

I’ve repeatedly referred to I-code in previous articles. Let’s go into some more detail, and in passing correct a mistaken impression I may have given.

What distinguishes I-code from other virtual machine code

As you enter, edit, or load source code with basic09, it is converted to I-code. I-code is commonly likened to code for virtual machines, like the JVM or the P-code machine. Goodness knows I’ve done it myself. I should have at least described the difference between I-code and P-code or Java bytecode, so let’s start there.

I-code isn’t quite the same as those, because it has to serve two masters:

  • basic09, which lets you develop and debug code. It wants to be as interactive as possible, and must be able to show you the code you’ve written. (BASIC09 is the language; basic09 is a program.)
  • runb, which just runs code.

Java and USCD Pascal exist solely as compilers. IDEs serve the purpose of basic09 for them using the original source code and compiled code, so there’s no need to regenerate source. The virtual machine can, therefore, represent control flow with lower-level conditional and unconditional branches and calls like those we know from actual processors.

So, what are we to do?

  • On one side lies the Scylla of Color BASIC’s approach. Color BASIC, originally running on computers with extremely limited RAM, opts for methods that save data and code space and minimally modify source code, at a dire cost in execution speed.
  • On the other side lies the Charybdis of compilation–a major speed advantage at run time, but we can’t both compile and meet basic09’s needs.

basic09 takes a middle path with I-code:

  • It parses expressions and converts them to reverse Polish notation, a common practice in virtual machines. It converts identifiers to references to their entries in a symbol table, and numeric constants to their internal forms. Line numbers are remembered, but in I-code are turned into offsets to their destination.
  • It converts each of the lines that collectively form statements into constructs that indicate their function and include any expressions they contain so that they can both be executed and listed in a form that almost matches how they were entered. (In at least one case an expression gets moved: the STEP for FOR statements is kept in the construct corresponding to NEXT. After all, that’s where you actually use it.)

It thus remembers variable names and line numbers, but only actually uses them to list code or in debug mode where you can examine and set variables. (runb, of course, doesn’t care about them.)

What does basic09 use for its symbol tables? Linear search, binary tree, hash function? I have no idea, but as a BASIC09 user, “Frankly, my dear…” Users of Color BASIC programs get to twiddle their thumbs as the interpreter seeks the same symbols and lines over and over and over at run time, but not me.

“That’s not what I typed!” It’s close enough, and the payoff is worth the difference.

“What’s with that almost?” you ask.

All Color BASIC does to your source code is tokenize keywords and convert line numbers. Your code is otherwise unchanged–that’s why it runs so slowly. basic09 converts what you type to I-code and then discards it. Your code isn’t listed, it’s regenerated…almost.

  • Numeric constants are stored appropriately for their type, and in listings are printed that way as well. For example, if you type a value as 65535, you’ll see it in the listing as 65535. because it’s too big to fit in a 16-bit signed two’s complement integer.  Also, basic09 will only keep floating point values as accurately as its internal representation permits.
  • Spaces are noticed for their help in marking the ends of tokens, period. There’s nothing in I-code corresponding to them.
  • Case isn’t significant in BASIC09 variable names. Since variable names are stored just once in the symbol table for each procedure, they’ll come out consistent in listing, all matching what you typed for it the first time. (I rather like it–camelCase with minimal shift key!)
  • RPN, the form I-code keeps expressions in, is unambiguous. No parentheses needed, no issues with precedence or binding, and easy to evaluate… but what to do when you list the code? Conversion from RPN to infix gives you two options: parenthesize everything (the approach decode takes) or take operator precedence into account and emit the least parentheses required (see, for example, the Algol 68 and C versions at basic09 does the latter.

The designers of BASIC09 made basic09 change other things in listings intentionally:

  • basic09 capitalizes keywords; their spelling is kept in one place just as variable names are. Keep mostly to lower case in your identifiers, and the result is far easier to read than all upper case Color BASIC. (Algol 68 users: think “case stropping“.)
  • basic09 indents code for you, at no cost in I-code size. Compare this with the vast expanses of Color BASIC code in RAINBOW, where the programmers intentionally packed code into as few lines and characters as possible, legibility be damned.

The main point, aka the “tl, dr” version: basic09 imposes a standard layout. You may chafe at this; back in the day I didn’t like the way it outdented EXITIF…ENDEXIT–but there are good reasons to have a standard layout.

Programmers have spent a lot of time argu^H^H^H^Hdiscussing code layout. There are roughly a dozen indentation styles and variants for C-like languages, and a program (indent) with dozens of knobs, enough to twist to reformat code into any of them. Switching to a new project? Prepare for a new layout standard!

Back in 2009, the designers of the Go programming language chose to skip the brouhaha,  bikeshedding, and the burden of adapting to different standards for different projects, and included gofmt in the Go system. Like indent, gofmt reformats Go source. Unlike indent, gofmt has zero options that affect how it formats code; you will get what Go’s designers chose as the standard layout. Projects written in Go often only let you commit code fed through gofmt. No fiddling with layout, no having to shift gears with a new project.

About three decades earlier, the creators of BASIC09 decided the structure of I-code, whence followed how it’s listed–perhaps not for the same reason as the designers of Go, but with the same result.

Outro and thanks

Silly me! I thought I could get through I-code in one article, but at best I’ve just given some motivation and told you how I-code isn’t quite the same as Java bytecode or P-code. So, stay tuned for more.

I would be remiss were I not to thank Wayne, the author of “The Structure of I-Code” on the TRS-80 CoCo Wiki and of decode. The former is, as far as I know, the only publicly-available documentation of I-code. The latter is an I-code decompiler. Were it not for them this article, and eventually its successors, wouldn’t exist. Thanks, Wayne.

Author: James Jones
Programmer since 1971. First heard of a Unix-like OS for 6809-based computers called OS-9 in the early 1980s, and went to work for a fellow so I could use his Smoke Signal Broadcasting Chieftain and learn more about OS-9. Eventually Microware Systems Corporation, the creators of OS-9, offered me a job, and I spent fifteen years mostly working on compilers. In passing got a Tandy CoCo 3, and an MM/1 , a MM/1a, and a VME-bus box to run OS-9/68000. After a brief flirtation with OS/2 Warp, I switched to Linux and haven't looked back--much--though I will use Windows under duress--er, when necessary. Now I enjoy this era in which skilled hobbyists can do things we'd have killed for back in the day for insanely little money (even ignoring how little dollars are worth now)--I hope that now we will get the kind of computer the CoCo could have become

1 thought on “BASIC09 I-code: what it is, and why

Leave a Reply

Your email address will not be published. Required fields are marked *