NOTE: If what I describe in this article applies to other implementations of BASICs that use reserved string space, please leave a comment and let me know.
I was in junior high school when I started learning to program BASIC. Initially, this was done by reading a book a classmate had and then writing programs out on paper. We’d then go in to a Radio Shack and type them in on a TRS-80 Model 3.
I really started to learn after my dad got me a $299.99 Commodore VIC-20. I stayed up all night going through the manual and typing in examples.
A year later I would do it all again with the manuals that came with my next computer, a 64K Radio Shack TRS-80 Color Computer.
Since I had learned under Radio Shack Model III Basic, CBM BASIC V2, and Extended Color BASIC, I was aware that different BASICs had different commands and features. The VIC-20 used “GET A$” while the CoCo used “A$=INKEY$“. Same dance, different song.
When I began exploring my old VIC-20 programs a few years ago, I dove back in to CBM BASIC and learned there were more differences than I was aware of back in 1982. For example, CBM BASIC seemed to allow declaring as many strings as you wanted until you were out of memory. On Color BASIC, string memory was pre-allocated and defaulted to 200 bytes. You used the CLEAR command if you want to increase or decrease that amount.
If Microsoft did both variants of BASIC, why such different string handling?
Color BASIC string storage
On the CoCo, memory looks something like this:
+---------------+ | BASIC ROM use | 0-1023 +---------------+ | 32x16 text | 1024-1535 +---------------+ | DISK ROM use | (if present) +---------------+ | EXT BAS gfx | (if present) +---------------+ | BASIC Program | +---------------+ | Variables | +---------------+ | Arrays | +---------------+ | | | | +---------------+ | String Space | (200 bytes, default) +---------------+
Variables that are strings and Arrays that are strings will have entries that point to the actual string data elsewhere in memory — either in String Space or as constant string data in the program itself. For example, typing this in a program:
10 A$="This string will be in program space."
…will consume 0 bytes from the String Space area. Those types of hard-coded strings work even if you set String Space to none using CLEAR 0. But, the moment you want to modify the string using LEFT$, RIGHT$, etc., it will be copied in to String Space memory so the modifications can be done.
If you type this directly as a command:
A$="This string will be in string space."
…it will go to String Space since there is no program to hold those characters. If you want to force a constant string to be in string space, you can simply add “” to it, like this:
10 A$="This string will be in string space."+""
By doing that addition/concatenation with plus, a new string has to be made that will hold the original string plus the new one. BASIC doesn’t bother to check to see how large the new string is — which is 0 in this case. BASIC could have been much smarter about a lot of things.
But where does it all go?
In the first 1K of RAM that is reserved for us by the BASIC ROMs there are a series of 2-byte memory locations that BASIC uses to track string space. There is an excellent book that came out in the 1980s called Color BASIC Unraveled. It was a commented, disassembly of the BASIC ROM with other editions came out for Extended, Disk and CoCo 3 BASIC. In the original book, it listed these areas of interest:
The first column is a line number in this source listing, followed by the memory location where the code (or reserved memory in this case) is. We see that TXTTAB (“beginning of BASIC program”) is two bytes starting at hex &H19 (25 in decimal). To see where a BASIC program starts, we can type:
PRINT PEEK(25)*256+PEEK(26)
Next is VARTAB (“start of variables”) at hex &H1b which, as our memory map showed, is directly after the BASIC program. Thus, to see how large a BASIC program is, you could subtract the two:
PRINT (PEEK(27)*256+PEEK(28)) - (PEEK(25)*256+PEEK(26))
This is great fun at parties, but not important for this article.
Instead, we will look at three near the end: FRETOP (“start of string storage”), STRTAB (“start of string variables”) and MEMSIZ (“top of string space”).
FRETOP
FRETOP (&H21 / 33) will be where the reserved String Space starts. This memory is protected from BASIC, so the BASIC program, variables and arrays cannot grow past this location.
MEMSIZ
MEMSIZ (&H27 / 39) is the end of that String Space area, and it is always at the end of BASIC memory. On the CoCo, that could be 4094 on a 4K CoCo, 16382 on 16K CoCo, 32766 on a 32K CoCo, 32766 on a 64K CoCo.
If you are paying attention you may have noticed two things: 1) the memory location seems to be one byte short of what 4K, 16K or 32K should be, and 2) on a 32K or 64K CoCo, you only get the first half of memory for BASIC.
Both are true. The reason for the first is “I don’t know*.” The reason for the second is “because ROMs and 1980 design and compatibility and reasons.”
I did find it interesting that on a Color Computer 3 (released in 1986 with 128K, expandable to 512K), the value is 32767. Someone at Microware, where the enhancements to Microsoft BASIC were done, must have fixed this and given us that one extra byte. And to think, we spent all those years telling folks that upgrading to a 128K CoCo 3 wouldn’t give you any more memory for BASIC.
But I digress…
Those areas show the start and end of the reserved string space (well, almost, since the first actually points to one byte before it begins since it’s really the “top of free RAM” and represents the last byte BASIC could use).
STRTAB
STRTAB (&H23 / 35) marks the address inside that area of memory where the next string will be added. If string space is empty, it will be the same as MEMSIZ. If every last byte of string space is taken, it will be FRETOP.
These values are used by BASIC to know where to insert new strings, or tell if you just need to be shown that wonderful ?OS ERROR (out of string space) message.
Looking at String Space
Here is a simple example of what string space looks like. In this example, it is showing what would have been a CLEAR 16 with only 16 bytes reserved for String Space. I did this just so it would be simple to look at:
FRETOP MEMSIZ | | [.][.][.][.][.][.][.][.][.][.][.][.][.][.][.][.] | STRTAB
When a new string is created, BASIC will move STRTAB the length of the string, then copy the string in to memory. The variable table elsewhere (after BASIC program) will have an entry with the size of the string and that memory location. Let’s add a ten byte string:
A$="1234567890"
FRETOP MEMSIZ | | [.][.][.][.][.][.][1][2][3][4][5][6][7][8][9][0] | STRTAB
We could then create a new 4 byte string:
B$="ABCD"
FRETOP MEMSIZ
| |
[.][.][A][B][C][D][1][2][3][4][5][6][7][8][9][0]
|
STRTAB
And now we have only 2 bytes left. We could do C$=”XY” and that would work, but if we tried C$=”XYZ” we would receive ?OS ERROR.
Trash Day
When strings are replaced with shorter strings (such as A$=”HI”), or empty strings (such as B$=””), those new strings will first be added to string memory if there is room. If they can, the variable table entry will be updated to point to the new string. The bytes used by the old string will still there, but nothing points to them so they are now considered free/unused.
When a string needs to be allocated and there isn’t enough room between FRETOP and STRTAB, BASIC will attempt garbage collection. It will look for any free/unused string memory, then pack all the used strings down and adjust pointers accordingly. If there are a bunch of strings in use, the BASIC program will pause for a moment while this happens.
Too. Many. Words.
So rather than write about this, here is a simple program that lets us see the string memory and watch what happens as strings are replaced.
0 ' STRSPACE.BAS
10 CLS
20 GOSUB 200
30 PRINT@320,"TYPE A STRING:":PRINT:PRINT:PRINT:PRINT
40 PRINT@352,;:INPUT A$:GOSUB 100
50 GOTO 20
100 ' SHOW STRINGS
110 GOSUB 200:VR=1088:L=FT+1
120 IF L<=ST THEN C=96:GOTO 160
130 C=PEEK(L):IF C<32 THEN C=96:GOTO 160
140 IF C<64 THEN C=C+64:GOTO 160
150 IF C>=96 THEN C=C-96
160 POKE VR,C
170 VR=VR+1:IF VR>VR+288 THEN RETURN
180 L=L+1:IF L>MS THEN RETURN
190 GOTO 120
200 ' GET/SHOW STRING INFO
201 ' FT = FRETOP
202 ' ST = STRTAB
203 ' MS = MEMSIZE
210 FT=PEEK(33)*256+PEEK(34)
220 ST=PEEK(35)*256+PEEK(36)
230 MS=PEEK(39)*256+PEEK(40)
240 PRINT@0,"FT:";FT;" ST:";ST;" MS:";MS;
250 RETURN
This program will INPUT A$, then display the (default) 200 byte string buffer. Each time you enter a new line, the old A$ is orphaned (marked free/unused) and the new A$ will be added at the end. When you finally fill up all 200 bytes, BASIC will clean up the string space, moving everything down but the current A$ (which is in use, but about to be replace) and then adding the new A$, orphaning the old one. It looks like this:
It’s pretty neat to see it actually work. And, with this routine, you could actually write a little demo to test specific things, like this:
0 ' STRSPAC2.BAS
10 CLS
20 GOSUB 200
30 A$="1234567890123456789012345678901234567890"+"":GOSUB 100
31 B$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"+"":GOSUB 100
32 C$="abcdefghijklmnopqrstuvwxyz"+"":GOSUB 100
33 A$="":GOSUB 100
34 D$="********************************"+"":GOSUB 100
35 E$="================================"+"":GOSUB 100
36 F$="ARE WE OUT OF STRING SPACE YET? IF NOT I WILL ADD MORE"+"":GOSUB 100
37 G$="HELLO"+"":GOSUB 100
99 PRINT@352:END
This code would add A$, B$ and C$. Then A$ would be added as nothing, making the old A$ data available for purging at the next garbage collection time. D$, E$ and F$ are added, and during this, garbage collection will be needed and you’ll see the old A$ disappear. Finally, a short G$ is added to the end.
…or any other combination of adding strings or replacing strings that you’d like to see happen.
And yes, this routine is very very slow, so if anyone is interested, I’ll do a follow-up article with the display routine done in assembly language.
Conclusion
Is this useful? Not really. Most of us just use a CLEAR value large enough that our program won’t crash when using strings and get on with it.
But … in 1983 I wrote a cassette-based BBS (bulletin board system) called *ALLRAM* BBS. It stored the userlog and messages in RAM. At the end of the day, the SysOp (system operator) could save them to tape, and reload them the next time the BBS was ran. I had to make sure that even at 100% (every userlong entry full to the longest strings, every message line full to the longest lines) the BBS would not crash. I had to calculate string usage for worst case. But, had I understood how BASIC worked back then, I could have made a much smarter routine that would make the code automatically purge older messages when string space was about to run out. (I actually have an article I am working on for my website that deals with this.)
I guess the conclusion is … maybe it might be useful?
I definitely think is is kind of cool to see what is going on. I had no idea any of this went on, and didn’t care, when I was learning BASIC back in the early 1980s.
If you find this article kind of cool, consider buying the hosts of this website a cup of coffee. (That’s their donation page. Give it a click sometime.)
Until next time…
* I don’t know, but William “Lost Wizard” Astle did.
The BASIC keyword “VAL” turns a string in to a number, and one of the routines it uses scans a string that it expects to be terminated with a 0 at the end. Strings in String Space have no such termination, but apparently that’s how this routine operates. In order for it to work on a string stored at the very end of String Space, it temporarily POKEs the byte after string space to a 0, thus creating a 0-terminated string. After the routine is over, it restores the original byte. If String Space had gone to the last byte in RAM, the next byte would have been ROM and this would not work. Microsoft left that byte free for this purpose.
When Microware patched Microsoft Color, Extended and Disk BASIC for the CoCo 3, they replaced the RAM scan that Color BASIC used with a different one, and either intentionally “fixed” this, or didn’t know it was done that way for a reason. On the CoCo 3, the BASIC ROMs are copied from ROM in to RAM and then patched, so the entire 64K is running out of RAM, and thus VAL is temporarily patching the first byte of the Extended BASIC ROM at 32768 (&H8000). Since no active code is there, it does not harm and “just happens to work.”
Either clever, or an oversite, but no harm done, and as a consequence, I guess we get that one extra byte for our programs. ;-)
This is actually very cool. I had an idea of how it works but now I understand completely! Awesome article as usual!
If you run the program and mess with it, it’s pretty cool watching it work.